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Abstract 

This thesis is devoted to the study of some aspects of perturbative QCD, 
and in particular to the development of high-precision techniques for the 
extraction of physical parameters such as structure functions, parton distri- 
butions, and the strong coupling from the analysis of deep inelastic scattering 
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of such application, we will discuss the determination of the strong coupling 
constant. 
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1 

Introduction 



Quantum Chromo dynamics (QCD) is believed to be the theory of strong 
interactions. QCD as a gauge theory of quarks and gluons is unique among 
renormahzable theories in providing a basis for the parton model within 
the principles of relativistic quantum field theory and at present times it 
stands as a main building block of the "Standard Model" of the fundamental 
interactions. 

The strength of electro- weak interaction is so weak that perturbation the- 
ory is extremely reliable. Furthermore, the leptons are at the same time the 
fields in the Lagrangian and the particles in the detector. The case of QCD 
is different in both respects. Perturbative methods are only applicable in 
those particular domains of strong interactions physics where the asymptotic 
freedom can actually be reached. Although there are several attempts to 
describe non-perturbative effects of QCD, at present times we have not yet a 
solution of QCD in the low-energy domain. Also, QCD is a theory of quarks 
and gluons, while the real world is made up of hadrons. Clearly, some model 
is needed to match one to the other. 

Essentially all physic aspects of present and future hadron colliders, from 
particles searching beyond the Standard Model to electro-weak precision 
measurements and study of heavy quarks, need a detailed information on 
the hadronic initial states. Factorization is a central issue, as it allows to 
separate perturbative and non-perturbative contributions. In particular, it 
ensures that a non-perturbative quantity, as a structure function, is process 
independent. As a consequence, we can extract structure functions from, 
say, deep inelastic scattering processes, and use them as inputs for a hadron- 
hadron collision. Thus, we need a knowledge as precise as possible on the 
inputs of hadron colliders that we extract from other processes. 

The present thesis is devoted to the study of some aspects of perturbative 
QCD, and in particular to the development of high-precision techniques for 
the extraction of phenomenological parameters such as structure functions. 
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parton distributions, and the strong coupling from the analysis of deep in- 
elastic scattering data. This analysis is usually characterized by theoretical 
assumptions on the phenomenological parameters, such as the functional 
parametrizations of the structure functions or the small- a; behavior of parton 
distributions. These assumptions introduce potentially large biases, whose 
size is very hard to assess. 

In this thesis we will develop methods to reduce the sources of theoretical 
uncertainties. Specifically, we will first extend the method of truncated mo- 
ments from its original formulation for the non-singlet parton distributions 
to all flavor combinations. Truncated moments of parton distributions are 
defined by restricting the integration range over the Bjorken variable to an 
experimentally accessible subset < .x < 1 of the allowed kinematic range 
< a; < 1. As a consequence this method provides a way to avoid theoretical 
biases on the small- a; behavior of parton distributions. 

Special attention will be devoted to the numerical implementation of the 
technique of truncated moments. We will write the evolution equations for 
truncated moments in a particular form, which has the advantage of increas- 
ing the efficiency of the method considerably. 

A crucial ingredient of most phenomenological analyses is the technique 
adopted to interpolate experimental data, and to reproduce the correspond- 
ing errors and correlations. We will suggest an approach to this problem 
based on neural networks. We will show that with this method it is pos- 
sible to extract information from experimental data without introducing a 
functional parametrization of structure functions based on theoretical as- 
sumptions. Errors and correlations of the data points will be determined by 
a Monte Carlo technique. We will discuss results and details of the numeri- 
cal implementation, based on a subset of the available experimental data for 
unpolarized deep inelastic scattering. 

The method of truncated moments can be combined with the neural 
network fit to extract various quantities of phenomenological interest in a 
bias-free way. Here, as an application, we will adopt these techniques to 
determine the strong coupling constant. Other possible applications will be 
sketched in the conclusions. 

In Chapters 2 and 3 we will define our frameworks, i.e. QCD and the 
parton model. Besides all the general issues, such as the QCD Lagrangian, 
the Feynman rules, running coupling constant, deep inelastic scattering cross 
section and parton distributions evolution equations, we will discuss in detail 
some technical issues, such as the matching of the running coupling constant 
at quarks mass thresholds and the change of factorization schemes, since they 
are essential ingredients of our analysis. 

In Chapter 4 we will introduce the method of truncated moments. We 
will derive the relevant evolution equations and the corresponding solution. 
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Figure 1.1. General framework. 



We will discuss the numerical accuracy of the technique, and the way it can 
be implementeted for phenomenological applications. 

Chapter 5 is a very brief introduction to neural networks and it is mainly 
devoted to introduce the main algorithms and some practical rules. In Chap- 
ter 6, we will first describe the main features of the experimental data, and 
we will discuss under which conditions we can reproduce them by a Monte 
Carlo technique. Then, we will address to the neural network fit of data, and 
we will give details on the behavior of neural networks, such as their ability 
of finding an underlying law, and on the way we have to train them. 

As an application, the efforts of the previous Chapters will be used in 
Chapter 7 to give a determination of the strong coupling constant as- The 
phenomenology of deep inelastic scattering will be completed with the discus- 
sion of target mass corrections, the renormalization scale dependence, and 
the corrections due to the elastic contributions. The fitting procedure, as 
well as the estimation of the theoretical errors, will be illustrated in detail. 
Finally, in Chapter 8 we will summarize and discuss results and outlook. 
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2 

Basics of QCD 



This chapter is a very basic and brief introduction on QCD. Lots of reviews on 
this subject have been written (see e.g. [0, 0, 0]), and we could not write about 
it better. Here we will limited ourselves to introduce the essential aspects of 
the QCD framework, as the SU{N) gauge invariance, the QCD Lagrangian 
and the Feynman rules. We will pay more attention on the running coupling 
as, as it is an essential tool for Chapter |^. For this purpose, a very technical 
issue as matching conditions on quark mass thresholds will be reviewed with 
details. 

2.1 SU{N) gauge invariance 

We are interested in building a Lagrangian invariant under local phase trans- 
formations related to a non-Abelian group. We consider the Lagrangian 

£ = i^7„a"V (2.1.1) 

and the transformation 

ip{x) U{x)i;{x) (2.1.2) 

where 

U{x) = expliguj^ix)^^] (2.1.3) 

in which the are the A^^ — 1 traceless Hermitean matrices generating SU{N) 
and satisfying the relations 

[T^, T^] = tf^^'^T^, {T^)bc = -if^""^ . (2.1.4) 

By convention the normalization of the SU (N) matrices is chosen to be 

Trt^t^ = T«5^^ Tn = ^. (2.1.5) 
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With this choice, the generators obey the following relations: 

1\t2 _ 1 

Y.^ltt = Cj^S^c, Cf = ^^ (2.1.6) 

A 

Tr T^T^ = ^/^^^/^^^ = Ca5^^, Ca = N. (2.1.7) 

A,B 

As U depends on x, the derivative term daip no longer transforms as it should: 
indeed 

daip dc^Uifj = {daU)ilj + Udalp ^ UOaip. (2.1.8) 

We now look for a generalization of the derivative which does not spoil the 
invariance of C. We define accordingly the covariant derivative by de- 
manding that 

Dai^ UDai^ (2.1.9) 

or in operator form 

D^^D'^ = UD^U-\ (2.1.10) 
Since is to generalize da, let us introduce the ansatz 

Da = dj + igAa{x) (2.1.11) 
where Aa{x) is the N x N Hermitean matrix defined by 

Aa = Ait^. (2.1.12) 
The transformation requirement ( |2.1.10| ) implies that 

a: = UAaU~'^ - -UdaU-\ (2.1.13) 
9 

We have enlarged our Lagrangian in order to have SU{N) local invariance, but 
we have introduced A^^ — 1 vector fields to built the covariant derivative. In 
order to give these fields an existence on their own, we should include their 
kinetic terms in a way that does not break the original local symmetry. The 
Hermitean quantity 

Fafs = -^[Da,Dp] (2.1.14) 

will transform covariantly since Da does, i.e. 

Fafsix) ^ U{x)Fafs{x)U-\x). (2.1.15) 
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Using the definition of tlie covariant derivative ( |2 . 1 . 1 1| ) , we obtain 

Fap = da^Ap - dpA^ + ig [A,, Ap] (2.1.16) 

tliat is tlie Yang-Mills generalization of the field strengths of electromag- 
netism. The kinetic term is then given by 

CYM = -\{Fa.^F^'') (2.1.17) 

with the normalization of eq. ( p.l.5|) for the T-matrices. C does not depend 
on the representation of the fermions and therefore stands on its own as a 
highly non-trivial theory. 

If we consider the equation of motion without sources we have 

D'^F^p = 0. (2.1.18) 

Just as in electrodynamics, there are plane waves solutions and they have in- 
finite energy (but finite energy density). However, unlike in Maxwell's theory, 
they cannot be superimposed to produce finite energy solutions because of 
the non-linear nature of this theory, unless they move in the same direction. 
In addition the fields satisfy the kinematic (Bianchi) constrains 

D^F'^^ = (2.1.19) 

where 

_ le"^75^^^ (2.1.20) 

is the dual of Fajs- We emphasize that ( |2.1.19| ) is not an equation of motion 
since it is trivially solved by expressing F^js in terms of the potentials. 

There is even another way to construct a gauge and Lorentz invariant 
kinetic term from F^/j, that is 

/ = Tr e"^^^F„^F^5. (2.1.21) 
We did not consider this term as it can be expressed as a pure divergence 

^afs^s Tr F^ijF^s = Ad^W^ (2.1.22) 

with 



AsdaAp + '^AsA^Af3 



(2.1.23) 



This means that by taking / as the kinetic Lagrangian, we could not generate 
any equation of motion for the vector potential since it would only affect the 
action at its end points. 
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Finally note that there is no gauge invariant way of including a mass for 
the gauge boson. A term such as 

rri^A^A^, (2.1.24) 

is not gauge invariant. This is very similar to the Quantum Electrodynamics 
(QED) requirement of a massless photon. On the other hand a mass term 
for the fermions given by 

mij^j, (2.1.25) 

is gauge invariant. 



2.2 QCD Lagrangian 

QCD is an SU (3) gauge invariant field theory. The expression for the classical 
Lagrangian density is 

-Cclassical = -^F^pFf + ^ g„ (z ^ - m)ab Qb , (2.2.1) 



flavors 



where p = '~faD'^, the metric is given by f?"^ = (1, —1, —1, —1) and h = c = 1. 
These terms describe the interaction of spin-| quarks of mass m and massless 
spin-1 gluons. is the field strength tensor derived from the gluon field 

F^p = d.Aj - d,At - g f^^^^A^^A^^ (2.2.2) 
and the indices A,B,C run over the eight color degrees of freedom of the 



gluon field. It is the third "non-Abelian" term on the r.h.s. of eq. ( 2.2.2 ) 
which distinguishes QCD from QED, giving rise to triplet and quartic gluon 
self-interactions and ultimately to the property of asymptotic freedom. Note 
that each term in the Lagrangian has mass dimension four, in order to give 
the correct dimensions for the action when integrated over all space-time. It 
follows that the dimensions of the fields qa and A-^ are | and 1, respectively. 
The explicit sum in eq. ( p.2.1|) runs over the nf different flavors of quarks. 



g in eq. (|2.2.2| ) is the coupling constant which determines the strength of the 
interaction between colored quanta, and f^^*^ (A, B,C = 1, . . . , 8) are the 
structure constants of the SU (3) color group. The quark fields are in the 
triplet representation of the color group, (a = 1, 2, 3) and D is the covariant 



derivative ( |2.1.11|) . Acting on triplet and octet fields the covariant derivatives 
takes form 



{D^)ab = dJab + igif" A'i)ab , {D^)aB = OJaB + igiT^" A^) AB , (2.2.3) 



8 



2.3- Feynman rules 



where t and T are matrices in the fundamental and adjoint representations 
of SU{3) respectively. A representation for the generators is provided by 
the eight Gell-Mann matrices A"^, which are Hermitean and trace-less, 



(2.2.4) 



For the specific case of SU{3), from eqs. ( |2.1.6| ) and ( p.l.Tp we have 

4 



3 



(2.2.5) 



2.3 Feynman rules 

In may be shown (see e.g. that it is not possible to define the propagator 

for the gluon field, without making a choice of the gauge. As a consequence 
without a choice of gauge we can not perform perturbation theory with the 
Lagrangian of eq. ( p.2.1| ). The choice 



It {d'-Aff , (2.3.1) 



-^gauge-fixing — 

fixes the class of covariant gauges with gauge parameter A. Two common 
choices are the Feynman gauge, A = 1, and the Landau gauge, A = 0. In 
the following we will consider a general case. In a non Abelian theory such 
as QCD the covariant gauge fixing term must be supplemented by a ghost 
Lagrangian, which is given by 

-Cghost = d^r]^^ {D'XsV'') ■ (2.3.2) 

Here r^^ is a complex scalar field which obeys Fermi statistics. The derivation 
of the form of the ghost Lagrangian is best provided by the path integral 
formalism and the procedures of Faddeev and Popov. The ghost field cancel 
unphysical degrees of freedom which would otherwise propagate in covariant 
gauges. 

Eqs. ( p^.2.1 ),( f2.3.1 ) and ( |2.3.2| ) are sufficient to derive the Feynman rules 



of the theory in a covariant gauge. The Feynman rules are defined from the 
operator 

S = i j d^xC{x) (2.3.3) 

which gives the phase of transition amplitude, rather than from the La- 
grangian density. The Lagrangian density can be separated into a free piece 
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Co, which normally contains all the terms bilinear in the fields, and an inter- 
action piece, £/, which contains all the rest: 

S* = 5*0 + Sj, 



So = i j d'^xCoix), Si = i j <fxCi{x) . (2.3.4) 

The practical recipe to determine the Feynman rules is that the inverse prop- 
agator is derived from —5*0, whereas the Feynman rules for the interacting 
parts of the theory which are treated as perturbations are derived from Si. 

In order to understand this recipe, we will follow and compare the 
following two different approaches to the quantization of a theory. For sim- 
plicity, we will take a theory which contains only a complex scalar field </> 
and an action which contains only bilinear terms, S = 4'*{K + K')(f). In 
the first approach, both K and K' are included in the free Lagrangian, 
So = <P*{^ + K')4>- Using the above rule the propagator A for the (p field is 
given by 

In the second approach K is regarded as the free Lagrangian, 5*0 = 4>*K(I), 
and K' as the interaction Lagrangian, Sj = (j)*K'(j). Now Si is included to 
all orders in perturbation theory by inserting the interaction term an infinite 
number of times: 

^ . z^,(zL)^^(z1)U^)k'(z^)k'(zL),... 



K \ K ) \K ) \ K ) \ K ) \ K ^ 
= ~^ . (2.3.6) 

We observe that with the choice of signs described above the full propagator 
of the (j) field is the same in both approaches, demonstrating the internal 
consistency of the recipe. 

The quark and the gluon propagators are obtained using the free piece 



Co of the QCD Lagrangian given in eq. ( |2.2.1| ). Thus, for example, the 
inverse fermion propagator in momentum space can be obtained by making 
the identification 9" = —ip°' for an incoming field. In momentum space the 
two point function of the quark field depends on a single momentum p. We 
have 

ri?(p) = -^^a6(K/-m), (2.3.7) 



which is the inverse of the propagator given in Fig. |2.1| . The ie prescription 
for the pole of the propagator is added to preserve causality, exactly the same 
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Figure 2.1. Feynman rules for QCD in a covariant gauge for gluons (curly lines), 
fermions (solid lines) and ghosts (dotted lines) as given in Ref. [^]. 

way as in QED. Similarly the inverse propagator of the gluon field is given 

by 



^(2) 

{AB,al3} 



{p) = iS^ 



AB 



(2.3.8) 



It is straightforward to check that without the gauge fixing term this 
function would have no inverse. The result for the gluon propagator A is 



a 



)BC- 



(2.3.9) 
(2.3.10) 
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Replacing the derivatives with the appropriate momenta, eqs. ( |2.2.1| ), ( |2.3.1D 
and (|2.3.2|) can be used to derive all the rules in Fig. |2]l . 



2.4 Running coupling 

The QCD coupling constant is defined by 

a. = ^, (2.4.1) 

that is the strong-interactions analogue of the fine-structure constant. When 
we compute Feynman diagrams divergences arise and we need to regularize 
them. The most commonly used regularization scheme is the Modified Mi- 
nimal Subtraction (MS). This involves continuing momentum integrals from 
4 to 4 — 2e dimensions, and then subtracting off the resulting 1/e poles and 
also log47r — '-/e with 7^; being the Euler-Mascheroni constant. To preserve 
the dimensionless nature of the coupling, a mass scale fi must also be in- 
troduced and g fi'^g. While in QED the coupling constant is defined in 
a natural way by on-shell renormalization, in QCD, we would like to avoid 
discussing on-shell quarks, since these are strongly interacting particles that 
are significantly affected by non-perturbative forces. The use of an arbitrary 
renormalization point allows us to avoid this problem. We will define as 
by renormalization conditions imposed at a large momentum scale /z where 
the coupling constant is small; this value of as can then be used to predict 
the results of scattering processes with any large momentum transfer. 

However, the use of renormalization at a scale /z in a computation invol- 
ving momentum invariants of order p"^ involves some subtlety when and 
/i^ are very different. In this circumstances, Feynman diagrams with n loops 
typically contain correction terms proportional to (a^ log(p^//i^))"'. Fortu- 
nately, we can absorb these corrections into the lowest order terms by using 
the renormalization group to replace the fixed renormalized coupling with a 
running coupling constant. 

2.4.1 The P function 

The running of the coupling constant is defined to satisfy the renormal- 
ization group equation (RGE) 

^a.(t) = /5(t) , t = \og^, (2.4.2) 

where /i is the renormalization scale and Q the process energy. In QCD, the 
f3 function has the perturbative expansion 

(3{t) = -bal{l + b'as + ...), (2.4.3) 
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with 

^ lie. -2.,^ 33 -2n; 

127r 127r ^ ^ 

^ 17C1 - SC^n; - 3Cpn; _ 153 - 19n; 

27r(llC^ - 2nf) 2n{33 - 2nf) ' ^ ■ ■ ) 

where Uf is the number of active hght flavors at the scale Q^. 
In the perturbative region (a^ small), we have 



where C is the integration constant. It follows that 



^ bt-b'log^^ + bC. (2.4.7) 



Neglecting the term b', eq. ( p.4.7|) can be written as 

1 



bt + bC. (2.4.8) 



The choice of the integration constant is arbitrary, and it is usually linked 
to the available scales of the energy. Historically, the flrst choice was done 
in the '60s where the experimental energies were of the order of MeV. With 
this choice we can now write the integration constant as function of the 
normalization scale and of a parameter A as 

C = log^. (2.4.9) 

With this deflnition A represents the scale at which the coupling would di- 
verge, if extrapolated outside the perturbative domain. Then, at the Leading 
Logarithmic Approximation (LLA) we have 

«^^^') = 77^- (2.4.10) 

We will now include the b' term, too. In addition we will also let the 
integration constant be dependent on a scale close to the present experimental 
energies 
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where Mz is the mass of the Z boson. Eq. ( |2.4.7| ) thus gives 

— b log ■ 



with q;o(<5^) satisfying eq. (|2.4.8| ). Finally, 



(2.4.12) 



l-ao(Wlog^ 



(2.4.13) 



Since 



log 



a,(M|; 



aoiQ') l + ao(g>'log^i^ + 0(a^(g2)) 



log(l + a,(M|) 6t) ~ - log(l + ao(M|) bt) , (2.4.14) 



the expression we will use to fit as(M|) is given by 



as(M| 



l + 6a,(M|) log 



(2.4.15) 



Mi 



X 



6'«,(M|) 



l + 6a,(M|) logg 



log l + 6a,(Mi) log 



01 



For sake of completeness we give the Next-to-Logarithmic- Approximation 
(NLA) of function of A 



b log ■ 



1 - 



b' log log 
b log 2^ 



(2.4.16) 



Finally note that A depends on the logarithmic approximation used. We 
have 



Ai 

A2 \b 



(2.4.17) 



where Ai and A2 are defined at LLA and NLA respectively. 



2.4.2 Asymptotic freedom and confinement 

From eq. ( p.4.2| ) and ( |2.4.3| ) we have that, if b > 0, i.e. Uf < 16 as follows 
from eq. (|2.4.4|) , the coupling constant tends to zero at a logarithmic rate as 
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the momentum scale increases. Such theories are called asymptotically free. 
In theories of this class, the short distance behavior is completely solvable by 
Feynman diagram methods. Though ultraviolet divergences appear in every 
order of perturbation theory, from the renormalization group analysis follows 
that the sum of these divergences is completely harmless. If we interpret 
these theories in terms of a bare coupling and a finite cutoff A, eq. ( |2.4.10| ) 
indicates that there is a smooth limit in which tends to zero as A tends 
to infinity. 

To briefly describe the consequences of asymptotic freedom, we now con- 
sider an example coming from electrodynamics. The simplest phase of the 
electrodynamics is the Coulomb phase. It is characterized by massless pho- 
tons which mediate a long range 1/R potential between external sources. 
When charged matter particles are present, electrodynamics can be in an- 
other phase, the superconducting or the Higgs phase. It is characterized by 
the condensation of a charged field 



($) 7^ 0. (2.4.18) 

This condensation creates a gap in the spectrum by making the photon mas- 
sive. This phenomenon was first described in the context of superconducti- 
vity, where $ is the Cooper pair. The condensation of $ makes electric cur- 
rents superconducting. Its effect on magnetic fields is known as the Meissner 
effect. Magnetic fields cannot penetrate the superconductor except in thin 
flux tubes. Therefore, when two magnetic monopoles {e.g. the ends of a 
long magnet) are inserted in a superconductor, the flux lines are not spread. 
Instead, a thin flux tube is formed between them. The energy stored in 
the flux tube is linear in its length and therefore the potential between two 
external monopoles is linear (as opposed to the 1/R potential outside the 
superconductor). Such a linear potential is known as a confining potential. 
The same happens when we consider quark confinement. As a result we have 
that if one attempts to separate a color singlet state into colored compo- 
nents, i.e. to dissociate a meson into a quark and an antiquark, a tube of 
gauge field forms between the two sources. In a non-Abelian gauge theory 
with sufficiently strong coupling this tube has a fixed radius and energy den- 
sity, so the energy cost of separating color sources grows proportionally to 
the separation. A force law of this type can consistently be weak at short 
distances and strong at long distances, accounting for the fact that isolated 
quarks are not observed. An interesting description of confinement from a 
topological point of view can be found in . 
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2.4.3 Thresholds at quark masses 

We now consider the case of a heavy quark with mass m much greater than 
the relevant scale Q. In this case it can be shown that the effects of the 
heavy quark on cross sections are suppressed by inverse powers of m and can 
therefore be ignored for m -C Q [0]. 

In the MS renormalization scheme depends only on the renormalization 
scale. As a consequence heavy quarks contributions to the running of 
must be taken into account even when we compute a physical quantity at an 
energy lower then the heavy quark mass scale. However, logarithms of large 
masses induced by the renormalization group equations in the couplings are 
canceled against other logarithms that appear in the calculation of physical 
observables. This is obviously an inconvenient, since a lot of effort must be 
invested in intermediate stages of a calculation to compute terms that will 
cancel in physical quantities. 

To remedy this problem the standard procedure has been the use of the 
effective field theory language. For example, in QCD with a heavy quark 
and Uf — 1 light quarks, one builds a theory with Uf quarks and an effective 
field theory with nj — 1 quarks. Around the threshold of the heavy quark we 
require agreement of the two theories. This gives a set of matching equations 
that relate the couplings of the theory with n j quarks with the couplings of 
the theory with nj — 1 quarks. This way, below the heavy quark threshold 
one can work with the effective theory, but using effective couplings. Then, 
by construction, decoupling is trivial. This procedure is equivalent to other 
renormalization schemes and allows us to correctly obtain the asymptotic 
value of the coupling constant. The price one has to pay is that coupling 
constants might not be continuous at thresholds. However, the fact that one 
has to use appropriate matching conditions in passing thresholds has been 
frequently kept into account in the running of the QCD coupling constant by 
just taking a continuous coupling constant across thresholds. Then, the final 
results depend strongly on the exact scale one uses to connect the couplings. 
To solve this ambiguity we can vary the matching scale ^th between 1/2 and 2 
times the mass of the heavy quark. It can be shown |p that when appropriate 
matching conditions are taken into account the final answer does not depend 
on the exact iith used to connect the couplings. 

As an example, we take eq. ( |2.4.7| ) at LLA and we require that ^^(Q^) 
should be continuous at = jJ^j^ = kthm?, with m the heavy quark mass 

11 

^+^ = 77^-^-^' t = log^ (2.4.19) 



where index + and — refers to variables above and below the threshold. It 
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follows that 



' - ' (2.4.20) 



Finally, we get 



6v /i?/^ 
The generalized version can be written as 



«-(Q^)fl + ^4^1og^l . (2.4.21) 



«+(Q2) = a^{Q^)+Y.Ck {\og^\ at^\Q^) (2.4.22) 

fc=l ^ ^^thJ 



where 



Ci{x) = ^x (2.4.23) 

OTT 

Let now Q;s(Q^,n/) be the solution of eq. (|2.4.7| ) at LLA with A = A5 
and fixed n/ and let as(„^) be the value of the coupling constant at a given 
rif. Eq. (|2.4.21|) can be written as (see e.g. 0) 

«;(6)(Q') = + C,,, 



«,(5)(Q') = as{Q\h) 



«;(3)(Q') = a;'(Q',3) + C34 + C45, (2.4.24) 



where 



Ces = ^l^ijnl, 5) - a~^{ml, 6) , 

C45 = a;^(m2,5)-a;i(m2,4), (2.4.25) 

C34 = a;^(m^,4) - a;^(m^,3) , 

and we have set ^qth = for simplicity. As an example, we consider 

"5(5) (<5^) - tt5(4)(Q^) = 

= (O'^) - . °'"!!l.-„-,„... (2.4.26) 



as{mlb)-as{ml,4) 
as{m^,5)os(m^,4) 



i2 ,N , ,N «s("^6, 5) - a,(mg, 4) 



«,(Q^ 5) - «,(Q^ 4) + a,{Q\ 4)«,(g^ 5)- 



a,(m^,5)a^(m2,4) 

/n2 

2/^2 



(&5-&4)«.'(g^4)log^. 



17 



Basics of QCD 



This expression coincides with eq. ( ^.4.211 ) with as(5)(<5^) = and 
a,(4)(g2) = a_(g2)_ Note that eqs. (|2A2^ ) and ( gxil hold even at NLA. 

To conclude we observe that, at each threshold also A„^ changes. We now 
fix A = A5 and we will show how different values of A for different nf are 
related to each other. If we consider the NLA definition of A, we have 



t/2 



Q exp 



^-\ogb'asiQ') 



(2.4.27) 



If we require that as{Q^) given by is continuous at a threshold (we consider 
as an example Q = rriiy) 



h log -^ + 61 log log 



h log ^ + 65 log log 



A2 ' 
^^5 



(2.4.28) 



we find 



log 



log 



log 



Ai 



log 



(2.4.29) 



and 



Af^ 



Af-'m, 



-2(65-64) 



log 



^^5 



log 



A^ 



^ Af m- 



2(65-64) 



log 



A2 



(2.4.30) 



it follows that 



A4 



A,, 



A, 



As 
rUb 

rrih 
As 



''5-''4 



log 



log 



A2 



A2 
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Deep inelastic scattering (DIS) has been the process that estabhshed the first 
evidence of partons, and it is the traditional testing ground of perturbative 
QCD. Nowadays, it is no longer the correctness of the theory which come 
into question. Rather, the focus is shifted on the precise determinations 
of the unknown parameters, and the development of rehable computational 
techniques. The parameters to be determined include not only the strong 
coupling as, but also all quantities determined from the non-perturbative 
low-energy dynamics, and that, even though in principle computable, must 
be treated as phenomenological input in the perturbative domain. 

In this Chapter we will briefly discuss deep inelastic scattering and the 
"naive parton model". Then we will show how QCD modifies the simple 
Bjorken scaling property of the parton model, and discuss how these scaling 
violations can be calculated in perturbation theory. We will also discuss 
issues useful for the following Chapters, as the factorization schemes. 

3.1 Deep inelastic scattering and the parton model 

We now consider the scattering of a high-energy charged lepton, say, an elec- 
tron, off a hadron target as shown in Fig. |3.1| . If we label the incoming and 
outgoing lepton four-momenta by and k'^ respectively, the momentum of 
the target hadron (assumed henceforth to be a proton) by and the mo- 
mentum transfer by = k^ — k'^, then the standard deep inelastic variables 
are defined by 

= -<f 

= 

u = p-q = M{E' - E) (3.1.1) 
^ 2z/ 2M{E' - E) 
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e 



N 




/ 

Figure 3.1. Deep inelastic charged lepton-proton scattering. 



q-p E' 

y = 1 = ^ ~ T^' 

k ■ p L 

where the energy variables refer to the target rest frame and M is the proton 
mass. Henceforth we will consider only charged lepton scattering, Ip IX, 
for <^ M|, where the scattering is mediated by the exchange of a virtual 
photon. A review on deep inelastic neutrino scattering can be found e.g. in 
Ref. [0. 

The DIS cross section is given by 



dxdy 



1 



2xFi 



+ 



M 



[l-y)iF2-2xF,)-—xyF2 



(3.1.2) 



where Fi{x, Q'^) are the nucleon structure functions which carry the informa- 
tion on the structure of the target as seen by the virtual photon. The Bjorken 
limit is defined as Q^,^ ^ oo with x fixed. In this limit the structure func- 



tions are observed to obey an approximate scaling law |T^, |13|, i.e. they 
depend only on the dimensionless variable x: 



F,ix,Q' 



FAx) 



(3.1.3) 



Bjorken scaling implies that the virtual photon scatters off point-like con- 
stituents, since otherwise the dimensionless structure functions would depend 
on the ratio Q/Qq, with 1/Qo some length scale characterizing the size of 
constituents. The parton model picture of deep inelastic scattering is most 
easily formulated in the "infinite momentum frame" in which the proton is 
moving very fast, p^ ~ (P, 0, 0, P) with P ^ M. In this frame, we can 
consider a simple model where the photon scatters off a point-like quark con- 
stituent which is moving parallel with the proton and carrying a fraction ^ of 
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its momentum, i.e. = ^p'^. Neglecting the proton mass M, we can write 
eq. (|3.1.2|) as 



Ana' 



[1 + (1 - y f]F^ + 



X 



dxdQ'^ Q4 
In terms of the usual Mandelstam variables 



(F2 - 2xF,] 



(3.1.4) 



xy 



t 

u 



{k-k'f = -Q\ 
ip, - k'f = s{y - 1) 



(3.1.5) 



the matrix element squared for the amplitude of the process 

m + q{p,) - m + q{p'^) 

is given by 



2e^e 



2 + 



t2 



(3.1.6) 



(3.1.7) 



The notation ^ denotes the average (sum) over initial (final) colors and 
spins. Using the standard result for massless 2 —>■ 2 scattering, 



da 



(3.1.8) 



dt IGttP' 

and substituting for the kinematic variables in the matrix element squared, 
gives 



dQ^ Q' 

The mass-shell constraint for the outgoing quark, 

P'q = (Pq + qf = + • g = -2p . g(x - = 



(3.1.9) 



(3.1.10) 



implies ^ = x. By writing dx5{x — ^) = 1, we obtain the double differential 
cross section for the quark scattering process: 



da 



Ana' 



(3.1.11) 



By comparing eqs. ( p.l.4|) and ( p.l.ll|) we get that the structure functions in 
this simple model are 



F2 = xel5{x - = 2a;Fi 
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This result suggests that the structure function ^2(0;) "probes" a quark con- 
stituent with momentum fraction ^ = x. 

The above ideas are incorporated in what is known as the "naive parton 
model" 0: 

• 9(0'^^ represents the probability that a quark q carries a momentum 
fraction between ^ and ^ + dC,, where < ^ < 1; 

• the virtual photon scatters incoherently off the quark constituents. 

Thus the proton structure functions are obtained by weighting the quark 
structure functions with the probability distribution q{C,), 

F2 = 2xF, = V/ d^q{0^el6{x-0 

= 5^e2xg(x). (3.1.13) 



q,q 



The l.h.s. of eq. (|3.1.13|) is the Callan- Gross relation |Tj]) and it is a direct 
consequence of the spin-| property of the quarks. Indeed, we observe that 
the two terms in the square brackets on the r.h.s. of eq. (|3.1.4| ) correspond to 
the absorption of transversely {Fi) and longitudinally {F^ F2 — 2xFi in the 
Bjorken limit) polarized virtual photon. The Callan-Gross relation follows 
from the fact that a spin-| quark cannot absorb a longitudinally polarized 
vector boson. In contrast, spin-0 (scalar) quarks cannot absorb transversely 
polarized vector bosons and so would have Fi = 0, i.e. Fl = F2. Structure 
function measurements show that F^ F2, confirming that spin-| property 
of quarks. Note that in the QCD-improved parton model Fl in only non-zero 
at leading order in perturbation theory, i.e. F^ = 0{as). 



3.2 Factorization 



Factorization allows scattering amplitudes with incoming high-energy had- 
rons to be written as a product of a hard scattering piece and a reminder 
factor which carries the information on the physics at low-energies and mo- 
menta. The first term contains only high-energy and momentum components, 
and, because of asymptotic freedom, it is calculable in perturbation theory. 
The second term describes the non-perturbative physics, and it is given by a 
single process-independent function for each type of parton called the parton 
distribution function (PDF). Factorization is an essential tool as it ensures 
that a parton distribution measured in one process can be used in any other 
hard process. A detailed discussion of factorization is beyond the aim of this 
thesis, hence only a brief description will be given. 
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In the naive parton model, when terms down by powers of are 



neglected, the DIS cross section eq. (|3.1.4| ) can be obtained by the convolution 
of the distribution of a quark parton in the target, with fraction y of the target 
longitudinal momentum, with the point-like cross section for quark-current 
scattering 



da, 



eN 



9=1 



„(0) 



(3.2.1) 



where the index stands for quantities calculated without taking into account 
strong interactions and the convolution with respect to x is defined as 

dy 



if(S)g)ix) 



-fiy)9 , 
y \y 



X 



(3.2.2) 



The qualitative observation that hadrons emerge with a transverse momen- 
tum kx different from zero can be explained with the presence of gluon emis- 
sion. In a parton model without gluons, all final-state jets would be collinear 
with the virtual photon. Their hadron fragments will therefore be nearly 
collinear with the photon, that is, with a spread of of about 300 MeV as 
required by the uncertainty principle for confined quarks. The data clearly 
establish an excess of large fc^ hadrons which are the fragments of the quark 
and gluon jets recoiling against one another. 

We will now we consider QCD corrections to eq. ( |3.2.1| ) at the first 
order of the perturbative expansion in a^. In calculating these partonic cross 
sections we encounter divergences. These divergences are 

• collinear, if a massless parton radiates a collinear massless parton whose 
momentum is proportional to that of the emitting parton; 

• soft, if a parton emits or absorbs a massless parton with zero energy. 
When we sum the virtual contributions to the cross section given by diagrams 



in Fig. p.2| a,b and the real contributions given by diagrams in Fig. |3.2| c,d, 
we obtain that soft singularities are canceled. We are then left with the only 
contribution of collinear divergences that is proportional to 



7^ log — , 

27r /i^ 



(3.2.3) 



where /i^ is an arbitrary scale. If we take the initial parton, for instance, 
equal to a quark, the cross section can be written as 

(3.2.4) 



da 



(0) 



^ 27r ^ 2tt 



+ 9 



(0) 



a 



q 
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c d 



Figure 3.2. Diagrams giving corrections of order to the point-like quark-current 
cross sections. 



where we have neglected the sum over the flavors. It can be shown by direct 
calculation that 



(3.2.5) 



where daf^^ are the Born cross sections and Pij are process independent 
quantities that will be discussed in more detail later. The cross sections d&i 
are regular. Eq. ( p.2.5|) is very important as it allows to factorize the Born 
cross section da^^^ and redefine the PDF including in this redefinition the 
contributions of the coUinear divergences. We define 



q{x, Q^) 
gix,Q') = ^7(°)(x) + 0(a,). 



5(1 -X) + ^Pqq l0g% 



+ 9 



(3.2.6) 



We can proceed in the same way when we have a gluon in the initial state. 
Thus, we have 



q{x, Q') 



g{x, 



a 



91 



27r fi^ 



g(0) + 



•9 



•9 



(0) 

(3.2.7) 

(0) 
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In a matrix form we have 



t^))=^®(l) (3.2.8) 



"Mog^fi^- ^J!^)+0{a^), (3.2.9) 



with 



27r %2 ^p^^ 

where g^^-* and g^^^ are the divergent scale independent distributions, while q 
and g are the renormalized scale dependent distributions. 

We can picture the redefinition of PDFs as follows. As is increased 
to ~ Ql where Ql is some scale characterizing the process, say, the 
photon starts to "see" evidence for the point-like valence quarks within the 
proton. If the quarks were non-interacting, no further structure would be 
resolved increasing Q^: scaling would set in, and the naive parton model 
would be satisfactory. However, QCD predicts that on increasing the reso- 
lution (Q^ ^ Ql), we should see that each quark is itself surrounded by a 
cloud of partons. The number of resolved partons which share the proton's 
momentum increases with Q^. 

The scale /x^ is not physical, being introduced arbitrarily in the coUinear 
divergences subtraction. Varying /i^ we have a corresponding variation in 
the cross section at the order as- As a variation of as gives a contribution 
of order a^, these correction are negligible. If we had the whole series in as, 
all the terms proportional to fi^ would cancel each other and we would not 
have any dependence on /i^. Thus, the /i^ dependence is a consequence of the 
truncation of the perturbative expansion. Usually, we assign /i^ a value of 
the order of the process energy scale, as to avoid large logarithm to appear. 
In the DIS case we choose /x^ ~ Q^. 

3.2.1 Coefficient functions 

With the redefinition of PDFs given in eq. ( |3.2.7| ), the cross section in eq. 
(|3.2.4|) can be written as 



Os 

= C\x)®q{x,Q^)+C'{x)®g{x,Q^), (3.2.10) 

where C^'^ are the coefficient functions that contain the finite part of the 
partonic cross section. The structure function F2 is given by 

F2{x,Q') = x£^||:e;g,(y,Q^)[5(^-l)+^rfa«(a:) 

+ 2nfg{y,Q')^da^^\x)'^ (3.2.11) 
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where we have used the fact that dajj^\x) = 6{1 — x). In the MS scheme the 
NLO contributions to the coefficient functions are given by |15 



log(l — x) 

1 — X 

1 + 



2 V 1 -a; 



^ _ ^ \ogx + 3 + 2a; - ( y + - ) 5(1 - x) 



9 



Tr 



(3.2.12) 
'1 + x) log(l — x) 



(3.2.13) 



((1 - x^) + x^) log - — - - Sx^ + 8x - 1 



X 



The next-to-next-to-leading order (NNLO) contribution to coefficient func- 
tions have been calculated in |]rB[. The coefficient functions are process de- 
pendent quantities and can be computed from the Renormalization Group 
Equation approach to the Operator Product Expansion of two currents. They 
describe the deviation from the canonical behavior of free field theory. 



3.3 Evolution equations 

3.3.1 Scaling violations and the Altarelli-Parisi 
equations 



The redefinition of parton distributions given in eq. ( p.2.8|) , introduces a scale 



dependence, and the redefinition of parton distribution means that we have 
substituted the bare quarks and gluons with clouds of partons. This is right 
what it happens when we renormalize a coupling constant. We will now 
derive the analogue of the RGE for parton distribution. Thus we require the 
invariance of eq. (|3.2.8|) from the scale /i^; we get 



d f q\ _ Ots [ Pqq Pqg \ ^ I q 



where t = log^, which is the Altarelli-Parisi equation for the unpolarized 



case 



Il7| . We can interpreted this equation saying that a quark with momen- 
tum fraction x can be produced by another quark with a larger momentum 
fraction y which has emitted a gluon. The functions Pij are the Altarelli- 
Parisi splitting functions. They are calculated perturbatively as power series 
in as- At the leading-order they have an attractive physical interpretation as 
the probabilities of finding a parton of type i in a parton of type j with a frac- 
tion X of the longitudinal momentum of the parent parton and a transverse 
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momentum squared much less than /i^. The interpretation as probabihties 
imphes that the sphtting functions are positive definite for x < 1, and satisfy 
the sum rules 



/' 

^0 



dxPqq{x) 



0, 



/ dxx[Pgg{x) + Pgg{x)] = 0, (3.3.2) 

/ dXX[2nfPgg{x) + Pgg{x)] = C 

Jo 

The leading-order [|l^ and the next-to-leading-order |18| contribution to the 
Altarelli-Parisi splitting functions have been calculated. Here we show only 
the LO contributions: 



^-^±^ + hil-x) 



Tn[x^ + {l-xf] 
l + il-xY^ 



Cf 
2N 



X 



X 



where 1/(1 — x)+ means: 



H + x{l — x) 

X 



5(1 -X 



(3.3.3) 



(llN-AnfTn) 



\x- 



(1-x). 



1 — X 



(3.3.4) 



In the general case in the Altarelli-Parisi equations we have a (2n/ + 1)- 
dimension matrix in the space of quarks and gluons. However, not all parton 
distributions evolve independently and we can restrict ourselves to two case. 
We define the non-singlet and the singlet parton distributions as 



q(^'\x,Q') = X^(g.(a;,Q2)-g,(x,Q2)), 
E(x,Q2) = Y.^q^{x,Q') + q.{x,Q')), 



(3.3.5) 
(3.3.6) 



i=l 



where in the non-singlet distribution definition qj may coincide with gj. Thus 
the analysis of the evolution is simplified. As the gluon emission is flavor 
independent, in the difference between two distributions the gluonic terms 
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cancel and the evolution of the non-singlet term is independent from the 
gluon term. We have 

= ^ [P., ® g^^^T . (3.3.7) 

In the singlet case we have a mixing between quarks and gluon in the evolu- 
tion and the Altarelli-Parisi equation is given by 

dt\gj 27, \P„ P,„ J \gj 

Note that the parton evolution is causal, i.e. known the parton distributions 
at an initial scale Qq, we know their values at an arbitrary value Q^. This 
provides to be an essential tool to fit experimental data. 



3.3.2 Solution of the Altarelli-Parisi equations 

There are different techniques to solve the Altarelli-Parisi equations. One 
can solve them by performing numerically the convolution integrals starting 
from input distributions obtained from data. This is particularly convenient 
if a simultaneous solution of all 2nj + l parton distributions is required, given 
a set of starting functions q{x, Ql) at some reference scale Q^. This approach 
is adopted e.g. in Monte Carlo simulations for parton branching processes 
0. A second way is the analytical solution of the evolution equations in 
the Mellin moment space, where the convolution products turn into ordinary 
ones. Mellin moments are defined by 

/„= / dxx'^-'fix). (3.3.9) 
Jo 

In this section, we find the general solution of the equation 

■fq = Cq, (3.3.10) 

dr 

where g is a vector with M components, and C is a generic M x M matrix. 
The usual QCD evolution equations are special cases of this equation in which 
M < 2. We will assume that C has a perturbative expansion in powers of a 
parameter a(r): 

C = Co + a{T)Ci + ... , (3.3.11) 

where 

"^4^ = -boa {l + ha + ...) , (3.3.12) 
dr 
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and bo = 2-71 b and bi = 271 bi, with b and b' given in eqs. ( |2.4.4D and ( ^^.4.51 ). 
For QCD applications 



a = — : r = — / dt agit 
27r' 271 ' ^ 



(3.3.13) 



to 



witht = log(QVA'gcz,)-^ 

The solution of eq. (|3.3.1CI|) can be obtained perturbatively. Expanding q 

to order a, 



we find 



d 

Jt 

^ „(i 



(0) 



dr 



= (Co + bo) g« + . 



The solutions of eqs. ( |3.3.15|) and ( p.3.16|) are 



g(0)(r 



= i?-ie^"i?g(°)(0) , 
g(i)(r) = i?-ie(^+''«)"i?gW(0) 



(3.3.14) 

(3.3.15) 
(3.3.16) 



(3.3.17) 
(3.3.18) 



where the matrix i? diagonalizes Cq, 

RCoR~^ = diag(7i, . . . , 7m) = 7 

and 

Ci = RCiR ^ . 



(3.3.19) 



(3.3.20) 



Collecting these results, and noting that aexp(6oT) = a(0), up to terms 
of order a^, we can write the solution as 



g(r) = f/(C, r)g(O) = i?"^ e^" + ae^^+'"'^^ / rfae-^-^+^'^^'C'ie^" 

L Jo 

with the initial condition 

g(0) =gW(0)+a(0)g«(0)• 
The explicit expression of U{C, r) is 

p7"T _ p(7™+feo)r' 



i?g(0), 
(3.3.21) 

(3.3.22) 



f/,,(C,r) = i?^ 



-1 



5^„e^ - + a(r)Cr 



-Rnj 5 (3.3.23) 
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which, expanded to next-to-leading order reduces to 



U,,{C,t) 




a(r) 



a(0) 



) 



In /bo 




+ 



7™ - 7" + 60 



nj ■ 



(3.3.24) 



In the case of standard QCD evolution equations the matrix Cq is at most 
2x2 and is easily diagonalized. 

3.4 Factorization schemes 

The way we subtract divergences is arbitrary. We may add a finite term 
of order to Z defined in eq. ( |3.2.9| ), obtaining an infinity of choices, all 
equivalent to each other as it happens for ultraviolet renormalization schemes. 
Once we fix the finite term, a coUinear subtraction scheme is defined. 

If we sum over the number of fiavors eq. ( p.2.8| ) in the Mellin space, we 
can write the column vector as (here the index n is omitted to simplify the 
notation) 



As the matrix Z is not uniquely defined up to term of order a^, we can 
consider two transformations: 



where Z and Z' are 2x2 matrices. As go is independent of the factorization 
scale /i we get 




(3.4.1) 



g(0) = Zq = Z'q' 

q' = Wq, W = I+—E 



(3.4.2) 
(3.4.3) 




0. 



(3.4.4) 



From the Altarelli-Parisi evolution equations we obtain 




(3.4.5) 



(3.4.6) 
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If we perform a dimensional regularization, the Z-matrices can be written as 



ZTT e 



Z' = I+^^Z' + 0{al) = ZW~' + 0{al) 
Ztc e 



We have 



-WZ~^ 



d 



dt 



—Z W~^ + Z [ —W~^ 



d 



dt 



and 



with 



^ P' = ^ wpw-^ - W-W-' 

271 271 dt 



We obtain 

P' = WPW-^ -asbE + O(a^). 
The substitution of W with its expansion ( p.4.3|) yields 



P' 
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E] -ba,E 



P+^[E,P]-^boE. 



(3.4.7) 
(3.4.8) 

(3.4.9) 



(3.4.10) 



(3.4.11) 



(3.4.12) 



(3.4.13) 



Expanding P in powers of ag at NLO and comparing terms proportional to 
same powers of a^, we finally obtain 



p'(0) ^ p(0) 

p/(i) = P« + [E, P(°)] - 



(3.4.14) 
(3.4.15) 



3.4.1 The unpolarized case: MS^ DIS 

We will apply the results of the previous Section to the particular case in 
which the starting scheme is the MS and the final scheme is the DIS. We 
will write the (n — 1)*'' moment of F2 in both the schemes. Since F2 is 
independent of the factorization scheme, comparing the two expression we 
will find a transformation between the schemes. 
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In the MS scheme, the Melhn moment of eq. ( ^.2.111 ) with the definition 
of non-singlet and singlet PDFs given in eqs. ( p.3.5|) and ( p.3.6|) , yields 



F2 

X 



(3.4.16) 



where 



(3.4.17) 



C9 — ^ na 



are the Mellin moments of the coefficient functions eqs. ( p.2.12| ) and ( p.2.13| ) 
and (e^) = 'Yl,i=q q^f I {'^''^ f) ■ ^ partonic scheme |jl9| we impose that the 
quark distribution is identified with F2 (as it is in the parton model at LO) 
at any order in 



F2 

X 



(3.4.18) 



If we take 



E 



Eqq{x) 2nfEgg{x) 



Egqix) 



(3.4.19) 



comparing the two expression of F2, we get 

+ g + + £ + • ^3.4.20) 

It follows 



E 



E. 



gq 



E. 



99 



and 



E 



NS 



(3.4.21) 



(3.4.22) 



The two other matrix elements of E specify completely the scheme. Here 
we will fix the scheme by assuming that all moments satisfy the relations 
between the parton-scheme gluon and the MS quark and gluons imposed by 
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momentum conservation on second moments pp] , PHp|. This is the prescrip- 
tion used in common parton sets, and usually referred to as DIS scheme. For 
n = 2 we have 



rfxxKi)(x) + p;«( 

dxx [-bo{E,, + E,,) - E,,{2nfPf){x) + pW(^)) + pi^){x){E,, + 

Jo 



2nfE,,) + E,,Pf^{x) - E,,Pf^{x) + P^^^x) + P,(J 



[X 



dxx[2nfP'^^p{x) + P'}p{ 

I dxx [-ho{2nfE,, + E,,) - 2n^E,,{Pf^{x) + Pf^{x))^ 
Jo 

^nfE,,P^f{x) - 2n^E,,p(J)(x) + 2nfP^^\x){E,, + E,,) 



+2n/PW(x)+pW(x)| =0 



From eq. ( |3.3.3| ) we get 



E, 
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-E. 



qq 



-CI E,, = -2nfE,, = -2nfC^ . 



Thus, generalizing for all n, we have 



-CI -2njCi 



(3.4.23) 



(3.4.24) 



(3.4.25) 



^ This condition is verified by the second moments of the NLO splitting functions in 
the MS scheme and fix their behavior at a; = 1 
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4 



Truncated Moments of parton 
distributions 



In the previous Chapter we have introduced the AltareUi-Parisi equations 
which describe the evolution of parton distributions. As we have seen, we 
can solve the evolution equations analytically by taking their Mellin trans- 
form, which turns convolution products into ordinary ones, and therefore 
the x-space integro-differential equation into a set of independent ordinary 
first order differential equations. Usually, a parametrization of the parton 
distributions is assigned at some initial scale, and the parameters are then 
determined by fitting to data the evolved distributions. Mellin moments of 
structure functions, however, cannot be measured even indirectly, since they 
are defined as integrals over the whole range < a; < 1, and thus require 
knowledge of the structure functions for arbitrarily small x, i.e. arbitrarily 
large energy W'^ = ^-iiz£)_ 

We can solve this problem using the Altarelli-Parisi equation to evolve 
parton distributions directly: the scale dependence of any parton distribu- 
tion at Xq is then determined by knowledge of parton distributions for all 
X > xo, i.e., parton evolution is causal. In fact, through a judicious choice of 



factorization scheme |[19| , pi| all parton distributions can be identified with 
physical observables, and it is then possible to use the Altarelli-Parisi equa- 
tions to express the scaling violations of structure functions entirely in terms 
of physically observable quantities. It is, however, hard to measure local scal- 
ing violations of structure functions in all the relevant processes: in practice, 
a detailed comparison with the data requires the solution of the evolution 
equations. 

The frequently-adopted input on the x dependence of the parton distri- 
butions at the initial scale is for example 



q{x,Ql) = aox'^' {1 - x^ P{x- ,a3, . . .) 
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where Ql is a reference scale. The parameter ai is associated with the small-x 
behavior while 02 is associated with the large-x valence counting rules. The 
term P{x; , as, . . .) is a suitably chosen smooth function, depending on one 
or more parameters, that adds more flexibility to the parton distributions 
parametrization. It has however become increasingly clear that in practice 
this procedure introduces a potentially large theoretical bias, whose size is 
very hard to assess [23]. In Ref. it was proposed to adopt a functional 
method to keep this theoretical error under control. Another suitable way 
to minimize the bias introduced by the parton distributions parametrization 
is to project the parton distributions on an optimized basis of orthogonal 
functions. Different methods have been suggested with suitable families of 
orthogonal polynomials {e.g. Bernstein Jacobi [^| or Laguerre polyno- 
mials [^) as basis of function. A different approach has been suggested in 
Ref. which makes use of truncated moments of parton distributions. 

In this Chapter the technique of truncated moments will be extended 
from its original formulation for the non-singlet parton distributions to all 
flavor combinations, and we will show its numerical implementation. 

In the following we will introduce the relevant evolution equations for 
truncated moments, and the corresponding solution. We will assess the ac- 
curacy of the method, and discuss techniques for phenomenological applica- 
tions [53]. In the last Section, we will show how a numerical technique to 



increase the numerical efficiency of the method will provide to be a useful 
tool to solve the Altarelli-Parisi equation [RO . 



4.1 Evolution equations for truncated moments 
and their solutions 

In Sect. ^.3.2| it has been described the solution of the Altarelli-Parisi equa- 
tions eq. ( p.3.1|) by taking ordinary Mellin moments. Here, we are interested 
in the evolution of truncated moments, defined for a generic function f{x) 
by 

fn{Xo)= [ dxX^-'f{x). (4.1.1) 

One finds immediately that the truncated moments of q{x, Q"^) obey the 
equation 

^Qnixo, Q') = dyy^-'Gn {^^ qiy, Q') , (4.1.2) 
with T given by eq. ( p.3.13|) and where 

Gn{x) = [ dzz^'-^Piz) (4.1.3) 
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is perturbatively calculable as a power series in a^. 
Expanding Gn{xo/y) in powers of y around y = I, 



y 



E 



—G 

dyP " V 



(4.1.4) 



y=l 



one obtains 



d 



-^g„(xo,Q^) 
dr 



EE 

p=0 k=0 



:-i)^+%"(xo) 
k\{p-k)\ 



Qn+k 



(4.1.5) 



The key step in the derivation of eq. ([4.1.5|) is the term-by-term integration 
of the series expansion. This is allowed, despite the fact that the radius of 
convergence of the series in eq. ( [4.1.4|) is 1 — xq, because the singularity of 
Gn{xo/y) at y = Xq is integrable (this can be proved using the Lebesgue 
definition of the integral). One can then express each power of {y — 1) using 
the binomial expansion, which leads to eq. (^4.1.5|) . 

Equation ( [4.1.5|) expresses the fact that, while full moments of parton 
distributions evolve independently of each other, truncated moments obey 
a system of coupled evolution equations. In particular, the evolution of the 
moment is determined by all the moments qj, with j > n. In practice. 



n 



th 



the expansion in eq. ([4.1.4|) , because of its convergence, can be truncated 
to a finite order p = M. The error associated with this procedure will be 
discussed in Sect. ^]2|. In this case, eq. ([4.1.5| ) can be rewritten as 



d_ 
d^ 



M 



qn{Xo, Q^) = Y1 ^nk\^o) qn+k{Xo, 



(4.1.6) 



k=0 



where 



M 



^nk 



:-l)P+^^g(a:o) 
k\{j>-k)\ 



(4.1.7) 



To solve the system of equations ([4.1.6| ), it is necessary to include a decreasing 
number of terms (M, M — 1, and so on) in the evolution equations for higher 
moments (ra + 1, n + 2, . . . ), obtaining M + 1 equations for the M + 1 
truncated moments {g^, . . . , g„+Af}- We will see in the next Section that this 
approximation is fully justified. In this case, the coupled system of evolution 
equations takes the form 



d n+M 
l=n 



n<k<n + M, 



(4.1.8) 
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where C is now a triangular matrix: 



Cki 
Cki 







(/ < k) 



(4.1.9) 



In the non-singlet case, discussed in Ref. [^], the matrix elements Cki 
are just numbers, and the matrix C in eq. ([4.1.9| ) is triangular, which makes 
it easy to solve eq. ([4.1. 8|) perturbatively. In the singlet case, each entry Cm 
is a 2 X 2 matrix. As a consequence, the matrix C, which is given in terms 
of partial moments of the evolution kernels, is no longer triangular, but has 
non-vanishing 2x2 blocks along the diagonal. 

Here, we will show how to solve the singlet case that easily reduce to the 
non-singlet case. By writing the perturbative expansion of C, we get 



C = Co 
where A = Aq -\ 



h aCi + . . . ={Ao + Bo) + a{Ai + Bi) + . . . , (4.1.10) 
aAi is block-diagonal, with 2x2 blocks on its diagonal, 

Aki = CkkSki, (4.1.11) 



while B 



5n 



aBi, considered as a matrix of 2 x 2 blocks, is upper- 
triangular with vanishing diagonal entries. Now one can define a matrix S 
that diagonalizes Aq, 



SAnS' 



diag(7 



1) 



>72mJ 



(4.1.12) 



Clearly, S is r-independent, block-diagonal, and easily computed. Equa- 
tion ( |4.1.(j| ) can then be rewritten as 



d 



q = Tq, 



(4.1.13) 



where q = S q and T = SCS^^. 

The new evolution matrix T is triangular at leading order (with the same 
eigenvalues as Aq). This is enough to solve the evolution equation to next- 
to-leading order. The general solution has been worked out in detail in 
Sect. 13.3.21; the result is 



g(r) = [/(r,r)g(0) 



where 



U^J{T,T) 



+ 




Ik I bo 



a(r) 



a(0) 
a(r) 



ll/bo 



(4.1.14) 



(4.1.15) 



R] 
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In eqs. ( [4.1.14D and ( |4.1.15| ), T is expanded as T = Tq + aTi, R is the matrix 
which diagonahzes Tq, RTqR~^ = diag(7i, . . .72m); finally, Ti = RTiR~^. 

The matrix R can be computed recursively, using the technique applied 
in Refs. p8| , p9| and proved in Appendix A. One finds 

i-i 



Ri 



1 

7i - 7j ^ 



7i - 7i 



^ -"p j ) 



(4.1.16) 
(4.1.17) 



p=i+l 



which, together with the conditions Ru = 1 and Rij = when i > j, deter- 
mine the matrix R completely. 

The general solution for the parton distributions is then 



where 



g(r)=f/(C,r)g(0), 



U{C,r) = S-^U{T,t)S. 



(4.1.18) 



(4.1.19) 



We have calculated the splitting functions and partial moment integrals 
which should be used in eq. ( [4.1.15| ) in order compute this solution explicitly, 



and they are listed in Appendices C and D of Ref. [29 



For the sake of completeness, we describe a different method to solve 
eq. ( ^4.1.61) . It is immediate to check that the matrix 



u{C,t)=i + J2 / dn... 



n=l 



rfr„C(ri)...C(r„) 



(4.1.20) 



obeys the differential equation 

d 



dr 



U{C, t) = CU{C, t) 



(4.1.21) 



with the initial condition U{C, 0) = /. In general, eq. ( [4.1.20| ) is not very 
useful, since it involves an infinite sum. In the present case, however, the 
infinite sum collapses to a finite sum. To see this, consider again the decom- 
position C = A + B, where A is block-diagonal and B is upper-triangular. 
It is easy to prove that 



where 



U{C,T) = U{A,r)U{B,r) 
B = U-\A,t)BU{A,t) . 



(4.1.22) 
(4.1.23) 
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Since A is block diagonal, U {A, r) is also block-diagonal, and it can be com- 
puted perturbatively using the procedure described in Sect. 3.3.2. Further- 
more, once U{A) is known, the upper-triangular matrix B can be computed 
through eq. ( [4.1.23|) . Now one can use the fact that upper-triangular ma- 



trices have the property that their M*'^ power vanishes. Hence, the solution 
can be expressed as the finite sum 



M—l 

U{B, r) = / + V / rfri . . . / " ' dr^Bin) . . . fi(r„) , (4.1.24) 
n=i -^0 Jo 

and from the knowledge of U{B) and U{A) one can determine the solution 
to the evolution equations explicitly. 



4.2 Numerical methods and their accuracy 

In this Section we will assess the accuracy of our method when the series 
of contributions to the right-hand side of the evolution equation ( [4.1.5| ) is 
approximated by retaining a finite number M of terms. The loss of accuracy 
due to this truncation is the price to pay for eliminating the dependence 
on parton parametrizations and extrapolations in the unmeasured region. 
However, unlike the latter uncertainties, which are difficult to estimate, the 
truncation uncertainty can be simply assessed by studying the convergence 
of the series. A reasonable goal, suitable for state-of-the-art phenomenology, 
is to reproduce the evolution equations to about 5% accuracy: indeed, we ex- 
pect the uncertainties related to the parametrization of parton distributions 
in the conventional approach to be somewhat larger (~ 10%)[|. Notice that 
there is no obstacle to achieve a higher level of precision when necessary, by 
simply including more terms in the relevant expansions. To this level of ac- 
curacy it is enough to study the behavior of the leading order contribution to 
the evolution equation: indeed, next-to-leading corrections to the anomalous 
dimension are themselves of order 10%. We have verified explicitly that the 
inclusion of the next-to-leading corrections does not affect our conclusions. 

We can compare the exact evolution equation ( |4.1.5| ) with its approximate 
form, eq. ( fl.l.6|) , by defining the percentage error on the right-hand side of 



the evolution equations for the quark non-singlet, singlet and gluon: 



NS JxQ 



M 

' ^NS 



(4.2.1) 



^Notice that this is not the uncertainty associated with evolution of a given parametriza- 
tion with, say, an x-space code; rather, it is the uncertainty associated with the choice of 
the parametrization, and with the bias it introduces in the shape of the distribution. 
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7?^ — 

'1^ Jxo 



dy y 



n-1 



+ 



Km = irr dy v^~" 



+ 



y 



y 



( ^ 

y 



G3J ( ^ 




where MNs,T.,g are the exact right-hand sides of the evolution equation ( [4.1 .51) . 
We study the dependence of the percentage error on the value of M for 
typical values of the cutoff Xq and for representative choices of test parton 
distributions. In particular, we parametrize parton distributions as 



aox 



1 — X 



I<I2 



(4.2.4) 



We begin by choosing, as a representative case, 02 = 4 and Oi = 1 for the sin- 
glet distributions and ai = for the non-singlet. The non-singlet is assumed 
to behave qualitatively as qns ~ ~ xS, in accordance with the behavior 
of the respective splitting functions. Furthermore, the normalization factors 
Qq for the singlet and gluon are fixed by requiring that the second moments 
of E(x,Q^) and g{x,Q'^) are in the ratio 0.6/0.4, which is the approximate 
relative size of the quark and gluon momentum fractions at a scale of a few 
GeV^. We will then show that changing the values of ai and 02 within a 
physically reasonable range does not affect the qualitative features of our 
results. 

The accuracy of the truncation of the evolution equation is determined 
by the convergence of the expansion in eq. (|4.1.4|) . Because this expansion 
is centered at ?/ = 1, and diverges at |/ = xq, the small y region of the inte- 
gration range in eq. (|4.1.2D is poorly reproduced by the expansion. Hence, 
even though the series in eq. ( |4.1.5[ ) converges, as discussed in Sect. 4.1, the 
convergence will be slower for low moments, which receive a larger contribu- 
tion from the region of integration y xq. In fact, for low enough values of 
n, the convolution integral on the right-hand side of the evolution equation 
(|4.1.2|) does not exist: this happens for the same value for which the full mo- 
ment of the structure function does not exist, i.e. n < 1 in the unpolarized 
singlet and n < in the unpolarized non-singlet and in the polarized case. 
Therefore, we concentrate on the lowest existing integer moments of unpo- 
larized distributions, i.e. the cases n = 2,3 for the singlet distributions, and 
correspondingly n = 1,2 for the non-singlet, which are the cases in which the 
accuracy of the truncation will be worse. 
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Xq = 0.1 


M 








'^2,M 






5 


0.63 


0.43 


0.55 


0.16 


0.12 


0.16 


10 


0.49 


0.36 


0.38 


0.13 


0.10 


0.12 


20 


0.34 


0.27 


0.26 


0.10 


0.08 


0.08 


40 


0.20 


0.17 


0.17 


0.06 


0.05 


0.05 


70 


0.12 


0.10 


0.10 


0.04 


0.03 


0.03 


100 


0.09 


0.07 


0.07 


0.03 


0.02 


0.02 


150 


0.06 


0.05 


0.05 


0.02 


0.01 


0.01 



Table 4.1. Values oflZ^ ^.j- for xq = 0.1 and different values of n and M. 









Xo = 


0.01 






M 




'^2,M 


'^2,M 


'^2,M 






5 


0.80 


0.12 


-42.81 


0.0050 


0.0024 


0.0080 


10 


0.71 


0.12 


-34.67 


0.0047 


0.0024 


0.0071 


15 


0.64 


0.11 


-29.23 


0.0044 


0.0024 


0.0064 


20 


0.59 


0.11 


-25.28 


0.0042 


0.0023 


0.0059 



Table 4.2. Values ofTi^j^ and 1^% for xq = 0.01 and different values of N and 
M. 



The values of 7?,^^^'^, computed at leading order with xq = 0.1, are shown 



in Table 4.1 



The table shows that non-singlet moments of order n behave as singlet 
moments of order n — 1. This is a consequence of the fact that, as dis- 
cussed above, the convergence of the expansion is determined by the singu- 
larity of the integrand G„ (xo/y) q{y) of eq. ( [4.1.2| ) as y — ^ Xq; near y = xq, 
the function Gnixo/y) is well approximated by the singular contribution 
log (1 — Xo/y), while parton distributions carry an extra power of y~^ in the 
singlet case in comparison to the non-singlet. We also observe in Table 
that, as expected, the convergence is slower for the lowest moments, and 
rapidly improves as the order of the moment increases. This rapid improve- 
ment is a consequence of the fact that the convergence of the expansion of 
G{xo/y) is only slow in the immediate vicinity of the point y = xq, and the 
contribution of this region to the n*^ moment is suppressed by a factor of 
Xg"^. Due to this fast improvement, the approximation introduced by includ- 
ing one less term in the expansion as the order of the moment is increased 
by one, which is necessary to obtain the closed system of evolution equations 
(|4.1.8| ), is certainly justified. 

The 5% accuracy goal which we set to ourselves requires the inclusion of 
more than 100 terms for the lowest moment, but only about 40 terms for 



42 



4.3- Techniques for phenomenological applications 



the next-to-lowest. The computation of series with such a large number of 
contributions does not present any problem, since the splitting functions are 
known and their truncated moments are easily determined numerically. The 
implications of this requirement for phenomenology will be discussed in the 
next Section. We can now study the dependence of these results on the value 
of the truncation point Xq by plotting the exact and approximate right-hand 



side of the evolution equations as a function of shown in Fig. |4.1| . 

The figures show that the case Xq = 0.1 studied in Table is a generic 
one between the two limiting (and physically uninteresting) cases Xq = and 
Xo = 1, where the approximation is exact. In fact, with this particular choice 
of parton distributions, xq = 0.1 is essentially a worst case and the error 



estimates of Table [4.1| are therefore conservative. 

An interesting feature of these plots is the presence of zeroes of the lowest 
moment evolution at Xq = in the non-singlet and around Xq ~ 10~^ in the 
gluon case. The physical origin of these zeroes is clear. At leading order, the 
first non-singlet full moment does not evolve. On the other hand, the second 
gluon full moment grows with Q^, while higher gluon full moments decrease, 
i.e. the gluon distribution decreases at large x; this imphes that the second 
truncated moment of the gluon must decrease for a high enough value of 
the cutoff Xo, while it must increase for very small xq; its derivative is thus 
bound to vanish at some intermediate point. Of course, the phenomenology 
of scaling violations (such as a determination of ag) cannot be performed at 
or close to these zeroes, where there is no evolution. From the point of view 
of a truncated moment analysis, this means that the value of Xq should be 
chosen with care in order to avoid these regions. 

In Table |4.2| we show for sake of completeness values of TV^^^'^ for xq = 



0.01. The large percentage errors on the first moment of the gluon are due 
to its zero value as it has been discussed above. 



Finally, in Table |4.3| we study the dependence of our results on the form 
of the parton distributions, by varying the parameters ai and 02 within a rea- 
sonable range. Of course, parton distributions which are more concentrated 
at small y give rise to slower convergence. However, we can safely conclude 
that the effect of varying the shape of parton distributions is generally rather 
small. We have also verified that varying the relative normalization of the 
quark and gluon distributions has a negligible effect on the convergence of 
the series, even though it may change by a moderate amount the position of 
the zeroes in gluon evolution discussed above. 

4.3 Techniques for phenomenological applications 

So far we have discussed scaling violations of parton distributions. In a 
generic factorization scheme, the measured structure functions are convo- 
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Figure 4.1. Right-hand sides of the evolution equations for the first and second 
truncated moments of the non-singlet distribution, and for the second and third 
moments of singlet distributions. The overall scale is set by as{2GeV^). 



lutions of parton distributions and coefficient functions. When taking mo- 
ments, convolutions turn into ordinary products and moments of coefficient 
functions are identified with Wilson coefficients. In the present case, how- 
ever, as shown in eqs. (|4.1.2| - |4?T75| ), truncated moments turn convolutions 
into products of triangular matrices. Hence, in a generic factorization scheme, 
truncated moments of parton distributions are related to truncated moments 
of structure functions by a further triangular matrix of truncated moments 
of coefficient functions. 



This complication can be avoided by working in a parton scheme |]T9| 



where the quark distribution is identified with the structure function F2. This 
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n = d, 


Xo = 


U.I 




i.O 


9 n 

Z.U 


U.U ( 


n IT? 

U.UO 


n 07 

U.U ( 


U.UO 


1.0 


2.0 


0.04 


0.02 


0.04 


0.02 


0.5 


2.0 


0.02 


0.01 


0.02 


0.01 


1.5 


4.0 


0.12 


0.05 


0.11 


0.05 


1.0 


4.0 


0.08 


0.03 


0.08 


0.03 


0.5 


4.0 


0.05 


0.02 


0.05 


0.02 


1.5 


6.0 


0.16 


0.07 


0.15 


0.06 


1.0 


6.0 


0.12 


0.05 


0.12 


0.05 


0.5 


6.0 


0.09 


0.03 


0.08 


0.03 



Table 4.3. Values ofTZ^j^,^ and TZ^ for xq = 0.1 and different choices of parton 
distribution parameters. 



still does not fix the factorization scheme completely in the gluon sector. One 
way of fixing it is to use a "physical" scheme, where all parton distributions 
are identified with physical observables PT|. This may eventually prove the 
most convenient choice for the sake of precision phenomenology, once accu- 
rate data on all the relevant physical observables are available. At present, 
however, the gluon distribution is mostly determined from scaling violations 
of F2, so within the parton family of schemes the choice of gluon factoriza- 
tion is immaterial. Here we will fix the scheme by assuming that all moments 
satisfy the relations between the parton-scheme gluon and the MS quark and 
gluons imposed by momentum conservation on second moments |^0|]. This 
is the prescription used in common parton sets, and usually referred to as 
DIS scheme. The explicit expressions of Altarelli-Parisi kernels in the DIS 



scheme that we have evaluated are given in Appendix D of Ref. ||2^ . 

With this prescription, the phenomenology of scaling violations can be 
studied by computing a sufficiently large number of truncated moments of 
structure functions, so as to guarantee the required accuracy. If the aim is, 
for instance, a determination of from non-singlet scaling violations, all 
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we need is a large enough number of truncated moments of the non-singlet 
structure function. Once an interpolation of the data in the measured region 
is available, the determination of such truncated moments is straightforward. 
This interpolation can be performed in an unbiased way using neural net- 
works, as we will see in the following. One may wonder, however, whether 
the need to use the values of very high moments would not be a problem. 
Indeed, very high moments depend strongly on the behavior of the structure 
function at large y, which is experimentally known very poorly. Furthermore, 
it seems contradictory that scaling violations of the lowest moments should 
be most dependent on the structure function at large y. 

This dependence is only apparent, however. Indeed, eq. ([4.1.5|) for, say, 
the non-singlet first moment can be rewritten as 

%i(xo,Q^) = f;^g?(xo,Q^), (4.3.1) 



dr a P 



where 



qlixo, Q') = f dyy'-\y - Ifqiy, Q') . (4.3.2) 



The need to include high orders in the expansion in eq. ( |4.1.6| ) is due to 



the slow convergence of the series in eq. (|4.3.1D , in turn determined by the 



fact that Gn{xo/y) diverges logarithmically at y = xq. Correspondingly, the 
right-hand side of the evolution equation depends significantly on for large 
values of p, which signal a sensitivity to the value of q{y, Q^) in the neighbor- 
hood of the point y = Xq. The dependence on high truncated moments g„ is 
introduced when is re-expressed in terms of by expanding the binomial 
series for {y — ly. Since this re-expansion is exact, it cannot introduce a 
dependence on the large y region which is not there in the original expres- 
sion. The high orders of the expansion do instead introduce a significant 
dependence on the value of the structure function in the neighborhood of xq, 
which can be kept under control provided xq is not too small, i.e. well into 
the measured region. There is therefore no obstacle even in practice in per- 
forming an accurate determination of from scaling violations of truncated 
moments. 

Let us now consider a second typical application of our method, namely 
the determination of truncated moments of the gluon distribution. In par- 
ticular, the physically interesting case is the lowest integer moment, i.e. the 
momentum fraction in the unpolarized case or the spin fraction in the polar- 
ized case. The need to include a large number of terms in the expansion of 
the evolution equations seems to imply the need to introduce an equally large 
number of parameters, one for each gluon truncated moment. This would be 
problematic since it is appears unrealistic to fit a very large number of pa- 
rameters of the gluon from currently available data on scaling violations. We 
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may, however, take advantage of the fact that the dependence on high order 
truncated moments is fictitious, as we have just seen, and it rather indicates 
an enhanced sensitivity to the value of Q^) as y ^ xq. This suggests 
that a natural set of parameters to describe the gluon distribution should 
include the first several truncated moments, as well as further information 
on the behavior of the distribution around the truncation point Xq, such as 
the value of the distribution (and possibly of some of its derivatives) at the 

point Xq. 

To understand how such a parametrization might work, notice that if 
g(y, Q^) is regular around y = xq, then it is easy to prove that 



J^^dyiy-irqiy,Q') 

lim — ^ 

q{xo, Q2) /^^ dy{y - 1)p 



1 



(4.3.3) 



by Taylor expanding q{y) about y = xq. We may therefore approximate the 
series which appears on the r.h.s. of eq. ( [4.3. 1|) by 



710- 



S{xo,no) 



+ 



E 

p=0 



dy{y~irq{y,Q' 



p=no 



dy{y - 1) 



(4.3.4) 



Equation ( [4.3.4|) describes the evolution of the first truncated moment of 
ihji Q"^) ill terms of the first no truncated moments and of the value of g(|/) 
at the truncation point xq. Of course, the approximation gets better if hq 
increases. It is easy to check that when xq = 0.1 the accuracy is already 
better than 10% when hq ~ 7. This means that a parametrization of the 
distribution in terms of less than ten parameters is fully adequate. It is easy 
to convince oneself that this estimate is reliable, and essentially independent 
of the shape of the distribution q{y,Q^). In fact, because slow convergence 
arises due to the logarithmic singularity in Gn{xo/y), we can estimate the 
error of the approximation in eq. ( [4.3.4| ) by replacing the functions gp{xo)/p\ 
with the coefficients of the Taylor expansion of log(l — xo/y) in powers of 
y — I, which we may denote by gp{xo)/p\- The error is then 



oo ^ 1 / \ 



E 



p=no 



p\ 



dy{y-iy (g(i/,Q2)-g(a;o,g^)) 



(4.3.5) 



< / dy 



no-l -1, 



log(l - xa/y) - ^ ^^^(y 



p=0 



p\ 



The expression inside the first absolute value on the r.h.s. of eq. (|4.3.5| ) is 
just the error made in approximating the logarithm with its Taylor expansion 
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around y = 1; thus, it is a slowly decreasing function of uq, it is integrable, 
and the integral receives the largest contribution from the region y ~ xq; 
the second absolute value, on the other hand, is a bounded function of y in 
the range xq < y < 1, which vanishes as y ^ xq for any choice of q{y, Q"^). 
These two facts combine to limit the size of the error. One can check directly 
that, choosing for example q{y, Q"^) = (1 — y)^ as in the previous Section, the 
accuracy is better than 10% with uq ~ 10 and Xq = 0.1, in agreement with 
the previous estimate. One may also verify that, as expected, changing the 
shape of q{y, Q^) does not significantly affect the result. 



4.4 Solving the Altarelli-Parisi equation with 
truncated moments 



The number M of truncated moments needed to achieve a precision on the 
evolution of the lowest moment comparable to that of other techniques is 
rather large (M ~ 150), and in some cases it may lead to problems in the 
numerical implementation of the method, even if in practice, it is sufficient 
to parametrize the parton distributions using the first few (between 7 and 
10) truncated moments, plus the value of the parton distributions at a; = xq. 
In this Section we present a different way to improve the numerical efficiency 
of the method of truncated moments. We begin by studying the unpolarized 
non-singlet case at leading order, leaving at the end the extension to next- 
to-leading order. 

We now integrate by parts the r.h.s. of eq. ([4.1.21). We get 



[' dyy-~'q{y,Q')Gj^) 

Jxo \y J 



(4.4.1) 



Gn{xo,y)y''~'q{y,Q') 



XQ 



d 



Xo 



dyG,,{xo,y)-{y"-'q{y,Q')) 



where 



Gn{Xo,y) = I dzGni — 



(4.4.2) 



(the lower integration bound is irrelevant here; it has been chosen equal to 



Xq for later convenience). Using the definition of Gn{xo^y) and eq. ( |4.1.3| ) 

we get 



Gn{xo,y) 



XO 



\z I dxx^'-^Pix) 

J Xi)/Z 



1 ry 



dxx'^-'P{x 

xo/y Jxo/x 



yGn ( — ) - XQGn-l ( — 

y J \y 



(4.4.3) 
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By taking the Taylor expansion of Gn{xo,y) around y = 1, we obtain 



Gn{xo,y)y'' q{y,Q 



(AAA) 



E 

p=0 



where 



dP 



Gn{xo,y) 



y=l 



dyP~^ dy 



Gn{xo,y) 



g?r\xo) .{4A.5) 



y=l 



The functions Gn{xo,y) are regular in the whole interval [xq, 1]. In fact, the 
Gn{xo/y) are regular for all values of y except ?/ = xq, as they contain singular 
terms proportional to log(l — Xo/y). However, these terms are integrable, and 
independent of n. Thus, Gn{xo,y) is regular in the limit y xq and tends 
to zero. Furthermore, we observe that the Taylor coefficient of order p of 
Gn{xo,y) is equal to that of Gn{xo/y), times a factor 1/p (see eq. ( f4.4.5| )). 
For this reason, the convergence of the expansion of Gn{xo,y) is faster than 
that of Gn{xo/y). 

Integrating by parts the second term of the r.h.s. of eq. ( 4.4.5 ), we have: 



-^Qnixo^Q'^) 



Gn{xQ,y)y'' \{y,Q 



J Xo 



(4.4.6) 



E 



{y-irf'-'q{y,Q'') 



p=Q 

oo 



pi 



dyy^'-^y 



\p-i „i„. /n2^ 



Truncating the series, expanding the binomial [y — ly ^ with q{l, Q"^) = 
(this is our only assumption on the behavior of the parton distributions), we 

get 

Xn ^q(x,,Qn y ^^(xn-l)P (4.4.7) 



-qn{xo,Q') = Xo"-^g(xo,Q^) f^%^(xo-l)^ 



dr 



p=Q 



p\ 



+ ^\xo)qn+k{Xo,Q'^ 



k=0 



where cl^\xo) are defined as in eq. (|4.1.7| ). Defining the triangular matrix 
as in eq. ( ^.1.9| ) we can finally write the truncated evolution equation as 

j^qnixo,Q^) = 
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xt'Qi^o, Q') Yl ^^(^0 - ir + E Cniqiixo, Q') . (4.4.8) 

p=0 ^' 1=1 



Notice that the first term in the r.h.s. of eq. ( ^4.4.8| ) vanishes in the limit 



M — >■ oo and the original expression given in eq. ( [4.1.8|) with no = 1 is 
recovered. However, for finite values of M this term must be taken into 
account (in a sense, it is the price we have to pay for the better convergence 
of the expansion after the integration by parts). This term poses special 
problems because it depends on the value of the parton distributions at x = 
Xq. In the following we will show how to obtain an approximated expression 
of g(xo,Q^) in terms of a finite number (not necessarily equal to M) of 
truncated moments. The evolution equation ([4.4.8|) will then be solved with 
a technique similar to that presented in Sect. 4.1. 

We begin by taking the Taylor expansion of q{x, Q"^) around x = Uq: 

oo 

g(x, Q') = Y^VkiQ'Kx - yo)'-' , (4.4.9) 
k=i 

The initial point of the expansion, yQ, must be carefully chosen. Parton dis- 
tributions parametrized as in eq. ( [4.2.4| ) are non-analytical in x = 1 when the 



exponent 02 is not an integer; and even in that case, an essential singularity 
in X = 1 is generated by perturbative evolution. One should therefore choose 
Ho ^ + Xq)/2, so that the expansion ( [4.4.9|) is convergent everywhere in 
[xo,l). The series will not be convergent in x = 1, no matter what yo is; 
however, the singularity in x = 1 is integrable, and the term-by-term inte- 
gration is allowed using the Lebesgue definition of the integral as we did in 
Sect. 4.1. We have therefore 

„i 00 

qj{xo,Q^)= dxx^-\{x,Q^) = J2(^jk{xo,yo)Vk{Q^), (4.4.10) 
"^^0 k=i 



where 



PAxo,yo)= [ dxx^-\x-y,f-\ (4.4.11) 
Our task is now to find a way of inverting eq. ([4. 4.101), in order to express 



the coefficients ?7fc(Q^) in terms of the truncated moments gj(xo,<5^). This 
can be done in the following way. Define a matrix (3~^ by 



(4.4.12) 

otherwise 
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where P^^^ is the N x N upper left sub-matrix of /5. For example, in the case 
N = 2 the matrix [3'^ is such that 



/I 



/3l3/322 — /323/3l2 
det 

fl 1 /323/3ll-/3l3/321 

^ ^ dot /3(2) 





(4.4.13) 



Multiplying eq. ( [4.4. 10|) by on the right, we obtain 

N OD N 

E ^i^' (^0, Q') = E E ^^•'^ ^^(^') ' 



(4.4.14) 



where for simplicity we have not shown the dependence on Xo,|/o- Using the 
definition of we get 



AT oo N 

j=l k=N+l j=l 

for i < N. Substituting eq. ( [4.4. 15| ) in eq. ( [4.4.9| ) gives 

AT N 

q{xo, Q^) = E E^fc~/ ^5^) (3^0 - yof'^ + Rixo, yo, 
k=i j=i 



(4.4.15) 



(4.4.16) 



where 



R{xo,yo,Q') = E ^'^(^') 



,xo - yo) 



\k-l 



(4.4.17) 



k=N+l 

N N 



We have thus obtained an approximate expression of g(xo, Q"^) as a function of 
the first truncated moments of g, eq. ( ^4.4.16| ); the quantity R in eq. ( [4.4. 17| ) 
represents the error on this reconstruction. The quantity in square brackets in 
eq. ( [4.4. 17[ ) is independent of the parton distributions, and can be computed 
for any and k starting from the coefficients given by eq. ( [4.4. 11[ ). The 
analytic expression of this quantity is very complicated. We have checked 
that, for = (1 + 2;o)/2, it decreases as [(xq — l)/2]'''~-^, for any value of 
N. Therefore, R{xo,yo,Q^) vanishes, for oo, at least as fast as the 

remainder of order N of the Taylor expansion in eq. ( [4.4.9[) . 

In order to assess the accuracy of our approximation, we have com- 
puted the percentage error given by ratio \R/q{xo,Q^)\ for some represen- 
tative choices of the parton density, namely q{x, Q"^) = (1 — x)"^ with 02 = 
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xo = 0.1 


N 


a2 = 2.5 


a2 = 3.5 


02 = 4.5 


5 


3.3 X 10-^ 


3.2 X 10-^ 


1.4 X 10-3 


10 


3.8 X 10-6 


5.4 X 10-^ 


1.4 X 10-^ 


15 


3.2 X 10-^ 


1.8 X 10-s 


1.8 X 10-9 



Table 4.4. Precision in the reconstruction of q{xo,Q'^) = (1 — xq)"^ in terms of 
a finite number N of truncated moments, for different values of N and for three 
different choices of 02 . 



2.5,3.5,4.5. We have fixed Xq = 0.1 and yo = {1 + Xq)/2. The results are 
shown in Table |4.4] . We see that an excellent approximation is achieved al- 
ready with = 5, independently of the value of the large-x exponent 02. 
The accuracy increases with increasing A^; however, it should be noted that a 
numerical evaluation of the matrix (3~^ requires a numerical precision which 
also rapidly increases with A^. Therefore, for a practical implementation 
of the method, cannot be very large. We see from Table that for 
5 < A^ < 10 the accuracy is already better than 10"'^ in the cases we have 
studied. We conclude that 



TV 



N 



.k=l 



q,{xo,Q^) (4.4.18) 



to an accuracy of about 10-^ for A^ = 5, independent of the detailed shape 
of q{x, Q"^), and rapidly increasing with A^. 

We are now ready to re- write the original evolution equation ( [4.4.8| ) using 
our result eq. ( |4.4.18| ). We get 



d 



M 



M 



^gn(Xo, Q') = J2 C'n; <lli^O, Q') + Yl <lli^O, Q') , (4.4.19) 



1=1 



1=1 



where Cni is defined in eq. ( [4.1.9|) , and 



nl 



D:,' = X 



n-l 




■ M 

E 

.p=0 



{xo - If 



N 



Y,(^ki\^o,yo){xo-yo)'' ^ 



k=l 



(4.4.20) 



We now turn to a test of the accuracy of the evolution equation. We 
will also compare the accuracy achieved with the method presented in this 
Section, and that of Sect. 4.1. The original evolution equation (|4.1.2| ) and 
its truncated version, eq. ( |4.1.6|) , can be written schematically as 



j^qnixo,Q^) 



Sn 



^qn{xo,Q'^) 



(4.4.21) 
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Xq = 0.1 


M 










5 


0.62 


0.14 


0.14 


0.020 


10 


0.48 


0.07 


0.12 


0.016 


20 


0.33 


0.03 


0.09 


0.009 


40 


0.20 


0.01 


0.05 


0.004 



Table 4.5. Comparison between percentage errors for the first and the second trun- 
cated moment at LO with N = 6, xq = 0.1, yo = {1 + xq)/2 and q{x,Q'^) = 
(1 - xf-^. 



respectively. Therefore, the quantity 



7^: 



n,M 



On 



(4.4.22) 



is the same measure of the accuracy of the method given by eq. ( [4.2.1 
Similarly, we write eqs. ( [4.4. 191 ) in the form 

.(xo,Q^) = 5f'-^)+Tr'^^ (4.4.23) 



dr 



qn( 



and define 



1 - 



q{A/- 



1) rpiM,N) 



Sn 



(4.4.24) 



>a,6 



to test the error of the method presented above. The values of com- 
puted at leading order with xq = 0.1 for n = 1 and n = 2, are shown in 
Table [4.5[ for different values of M and N = 6. We observe that the error 

^ ^ computed with the technique presented here is always much smaller 
than the corresponding error presented in Sect. 4.1, TZnM- accuracy of 
less than 10% can be achieved with a relatively small value of M. In Table [4^ 
we show also the comparison between percentage errors with xq = 0.01. We 
observe that, accordingly to what we have discussed in Sect. 4.2, the per- 
centage error on the second truncated moment is smaller for Xq = 0.01 than 
for xo = 0.1, while it is larger for the first truncated moment. 

Finally, in Fig. T?z, we show the xq dependence of the r.h.s. of the evolu- 
tion equation for the first and the second truncated moments. In both cases, 
we note that the approximation of the exact case is very good. 

The complete solution of the evolution equation, LO and NLO terms, can 
be written as 



(4.4.25) 



Rqn{xo,Q, 



o) ' 
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xo = 0.01 


M 










5 


0.78 


0.11 


0.0042 


0.0019 


10 


0.70 


0.06 


0.0039 


0.0017 


20 


0.58 


0.03 


0.0035 


0.0014 


40 


0.46 


0.01 


0.0030 


0.0009 



Table 4.6. Comparison between percentage errors for the first and the second trun- 
cated moment at LO with N = 6, xq = 0.1, yo = {1 + xo)/2 and q{x,Q'^) = 
(1-xP. 

with the initial condition 

qn{xo, Ql) = ql^\xo, QI) + a{0)ql^\xo, Ql) , 

and 

R{Co + Do)R-' = diag(7i,...,7M) = 7, (4-4.27) 
Ci + A = R{Ci + D,)R-K 

The matrix R that diagonalizes Cq + Dq must be computed numerically. This 
is not a problem, since, as we have seen, its dimension M does not need to 
be too large. 

The improvement of the numerical efficiency presented here can be ex- 
tended straightforwardly to the unpolarized singlet case, and to polarized 
partons as well. The tests we presented are only for the LO equations. We 
have checked that the inclusion of NLO terms does not modify our conclu- 
sions. The technique of truncated moments can now be easily implemented 
numerically for phenomenological purposes. 



(4.4.26) 
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Figure 4.2. Right-hand side of the evolution equations for the first and the second 
truncated moments of the non-singlet distribution with M = 10 and N = 6. The 
overall scale is set by as{2GeV^). 
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5 

Introduction to Neural 
Networks 



In the present times artificial neural networks constitute one of the most 
successful and multidisciphnary subjects. People with very different forma- 
tion, ranging from physicists to philosophers, and from biologists to engi- 
neers, are working on them trying to understand better and better how they 
work. Applications of artificial neural networks are widely used from images 
reconstruction to financial markets predictions. Artificial neural network 
methods are well established techniques for high energy physics too, where 
they are used for event reconstruction in particle accelerators. Quark-gluon 
separation, heavy quark jet tagging, mass reconstruction, and track finding 



procedures make use of artificial neural networks |32 



Here we will give a very brief and general introduction to artificial neu- 
ral networks (henceforth simply neural networks). Our main interest is to 
describe the very basic algorithms of neural networks, and some useful prac- 
tical issues in order to better understand the fits presented in the following 
Chapter. 



5.1 From biology to artificial neural networks 

The structure of biological nervous system started to be understood in 1888, 
when Dr. Santiago Ramon y Cajal succeeded in seeing the synapses between 
individual nervous cells, the neurons. This discovery was quite impressive 
as it proved that all the capabilities of the human brain rest not so much in 
the complexity of its constituents as in the enormous number of neurons and 
connections between them. To give an idea of these magnitudes, the usual 
estimate of the total number of neurons in the human central nervous system 
is 10^^, with an average of 10000 synapses per neuron. The combination of 
both numbers yields a total of 10^^ synaptic connections in a single human 
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Figure 5.1. Schematic structure of a neuron. 



brain. All the neurons share the common structure schematized in Fig. |0 
There is a cell body or soma where the cell nucleus is located, a tree-like 
set of fibres called dendrites and a single long tabular fibre called the axon 
which arborize as its end. Neurons establish connections either to sensory 
organs (input signals), to muscle fibres (output signals) or to other neurons 
(both input and output). The output junctions are called synapses. The 
inter-neuron synapses are placed between the axon of a neuron and the soma 
or the dendrites of the next one. 

The way the neuron works is basically the following: a potential difference 
of chemical nature appears in the dendrites or soma of the neuron, and if its 
value reaches a certain threshold then an electrical signal is created in the 
cell body, which immediately propagates through the axon without decaying 
in intensity. When it reaches its end, this signal is able to induce a new 
potential difference in the post-synaptic cells, whose answer may or may not 
be another firing of a neuron or a contraction of a muscle fibre, and so on. 
Of course a much more detailed overview could be given, but this suffices for 
our purposes. 



In 1943 W. S. McCulloch and W. Pitts suggested a mathematical 
model for capturing some of the above characteristics of the brain. First, 
an artificial neuron (or simply a neuron) is defined as a processing element 
whose state ^ at time t can take two different values only: C,{t) = 1, if it is 
firing, OT ^(t) = 0, if it is at rest. The state of, say, the 2*^ unit, ^j(t), depends 
on the inputs from the rest of the N neurons through the discrete dynamical 
equation 

m = (|$^^.^(^ - 1) - o}j , (5.1.1) 

where the weights ojij represent the strength of the synaptic coupling between 
the j^^ and the i^^ neurons, 9i is the threshold which points out the limit 
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Figure 5.2. A three-layer perceptron consisting of input, hidden and output layers. 



between firing and rest, and 9 is the unit step activation function defined as 

("■^) 

Then, a set of mutually connected McCulloch-Pitts units is what is called an 
artificial neural network. 

In spite of the simplicity of their model, McCulloch and Pitts were able 
to prove that artificial neural networks could be used to any desired compu- 
tation, once the weights Uij and the thresholds 6i were chosen properly. This 
fact made that the interest toward artificial neural networks was not limited 
to the description of the collective behavior of the brain, but also as a new 
paradigm of computing. 



5.2 Multilayer Neural Networks 

Among the different types of neural networks, those in which we have con- 
centrated our interest are Rosenblatt's perceptrons, also known as multilayer 



feed- forward neural networks |^ . In these networks there is a layer of input 
units whose only role is to feed input patterns into the rest of the networks. 
Next, there are one or more intermediate or hidden layers of neurons evaluat- 
ing the same kind of function of the weighted sum of inputs, which, in turn, 
send it forward to units in the following layer. This process goes on until the 
output level is reached, thus making it possible to read off the computation. 

In the class of networks one usually deals with, there are no connections 
leading from a neuron to units in the previous layers, nor to neurons further 
than the next contiguous level, i.e. every unit feeds only the ones contained 
in the next layer. Once we have updated all the neurons in the right order, 
they will not change their states. For these architectures time plays no role. 
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In Fig. 572 we have represented a three-layer perceptron with rii input 
units, 712 hidden units in a single hidden layer, and outputs. When an 
input vector ^ is introduced to the network, the states of the hidden neurons 
acquire the values 

<yi = 9 {fl^'S^^k - ) , J = 1, • • • ,^2 ; (5.2.1) 

the output of the network is the vector ( whose components are given by 

= 9 (^E^^?^^- - ,^ = l,...,ns. (5.2.2) 

Here we have supposed that the activation function can be any arbitrary 
function g, though it is customary to work only with bounded ones either in 
the interval [0, 1] or [—1, 1]. If this transfer function is of the form of the 
step function it is said that the activation is discrete, since the states of the 
neurons are forced to be in one of a finite number of different possible values. 
Otherwise, continuous functions commonly used are the sigmoids or Fermi 
functions 



which satisfy 



lim g{h) = e{h) . (5.2.4) 



In the terminology of statistical mechanics the parameter (3 is regarded as 
the inverse of a temperature. However, for our practical applications we will 
set p = l. 

Generally, if we have L layers with ni, . . . , units respectively, the state 
of the multilayer perceptron is established by recursive relations 

=^(^E-r^^"'^-^?) ,^ = l,...,n,J = 2,...,L, (5.2.5) 

where represents the state of the neurons in the l*^ layer, uf^ the weights 

between units in the (/ — 1)*^ and the l^^ layers, and Of^ the threshold of the 
i*'^ unit in the l*'^ layer. Then the input is the vector and the output the 
vector ^^^\ 

By simple perceptron one refers to networks with just two layers, the input 
one and the output one, without internal units, in the sense that there are 
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no intermediate layers. These devices have been seen to have hmitations, 
such as the XOR problem, which do not show up in feed-forward networks 
with hidden layers present. Actually, it has been proved that a network 
with just one hidden layer can represent any Boolean function Any 
continuous function can be uniformly approximated by a continuous neural 
network having only one internal layer, and with an arbitrary continuous 
sigmoid non-linearity |36|. 



5.3 Learning process of Neural Networks 

Connecting neurons together may well produce networks that can do some- 
thing, but we need to be able to train them in order to make them do anything 
useful. We also want to find the simplest learning rule that we can, in order 
to keep our model understandable. As is often the case in neural computing, 
inspiration comes from looking at real neural systems. 

Dogs are given tit-bits to encourage them to come when called. Generally, 
good behavior is reinforced, while bad behavior is reprimanded. We can 
transfer this idea to our network. The guiding principle is to allow neurons 
to learn from mistakes. If they produce an incorrect output, we want to 
reduce the chances of that happening again; if they come up with correct 
output, then we need do nothing. The learning paradigm can be summarized 
as follows: 

• set the weights and thresholds randomly; 

• present an input; 

• calculate the actual output by taking the thresholded value of the 
weighted sum of the inputs; 

• alter the weights to reinforce correct decisions and discourage incorrect 
ones, i.e. reduce the error; 

• present the input next time, etc. 

5.3.1 A simple perceptron learning algorithm 

Let us consider the set 

{(x^,z^) G M" X M'", /x = (5.3.1) 

of pairs input-output and a simple perceptron, i.e. a neural network with 
only two layers. The equations governing the state of the network are given 
by 

C^') = (7(/i,),z = l,...,m, (5.3.2) 
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(2) • (5.3.4) 



where the fields are given by 

n 

i=i 

We shall use the notation for the input and output patterns: 

X = 
o(x) = I 

For any given values of the weights and thresholds it is possible to calculate 
the quadratic error between the actual and the desired output of the network, 
measured over the training set: 

E[o] ^ -J2J2(o.{^') - ■ (5-3-5) 

/i=i j=i 

Therefore, the last squares estimate minimizes E[o]. Applying the gradient 
descent minimization procedure, what we have to do is just to look for the 
direction (in the space of weights and thresholds) to steepest descent of the 
error function (which coincides with minus the gradient), and then modify 
the parameters in that direction so as to decrease the actual error: 



dE 

Suij = -Vg^ = -^(oi(x^) - 4)9'{^i ■ x)x 



(5.3.6) 



= -r/— = -r^(o,(x'^)-zf)^7V.-x) 



where cjj • x = '^j^Q^^ijXj, with the updating rule 

ujij(t + l) — > ujij(t)+6ujij, (5.3.7) 

e,{t + i) ^ e,{t) + 6e,. 

The appearance of the derivative g' of the activation function explains why 
we have supposed in advance that it has to be continuous and different iable. 
The intensity of the change is controlled by the learning rate parameter r]. 
When no more changes (within some accuracy) occurs, i.e. 5uJij ~ 0, the 
weights are frozen and the network is ready to use for data it has never 
"seen" . The procedure is summarized as follows: 

1. initialize Uij with ± random values; 

2. pick pattern p from the training set; 

3. feed input x to network and calculate the output o; 

4. update the weights according to eq. (|5.3.71 ) and eq. ( |5.3.6|) ; 

5. repeat from 2 until uJij{t + 1) ~ uJij{t). 
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5.3.2 Learning by error back-propagation 

We now consider a multilayer neural network and generalize the delta rule 
or gradient descent procedure that we have seen in the perceptron case, 
eq. (|5.3.6| ) |]37| . The equations that describes the multilayer neural network 
are given by 

= g{h?),t = l,...,m, (5.3.8) 

,(0 dE 



(5.3.10) 



dE 



dO. 



(0 



Substituting the first two equations in eq. (|5.3.5| ), and taking the derivatives, 
it is easy to get 



= -r7X^Ape^)^ . = l,...,n,, j = l,...,n,_, 

(5.3.11) 

= r^X^Ap, ^ = l,...,n,, 

^l=l 

where the error is introduced in the units of the last layer through 

= (/.P) [(..(x'^) - zf] (5.3.12) 
and then is hack-propagated to the rest of the network by 

Af^)^ = g' (/.f ^)^') ■ (5-3.13) 



This result can be easily derived by taking a neural network with only one 
hidden layer. Summarizing the batched back-propagation algorithm for the 
learning of the training set (|5.3.1|) consists in the following steps 



1. initialize all the weights and thresholds randomly, and choose a small 
value for the learning rate rj; 
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2. run a pattern of the training set and store the activations of all the 
units (z.e. |ef^'',VM}); 



3. calculate the A\^^^ and then back-propagate the error using eq.( 5.3.l"^ ); 



4. compute contributions to 6ujfj and S9f \ induced by this input-output 
pair (x'^, z^); 

5. update weights and thresholds; 

6. go to the second step unless enough training epochs have been carried 
out. 

The adjective "batched" refers to the fact that the update of the weights 
and thresholds is done after all patterns have been presented to the network. 
Nevertheless, simulations show that, in order to speed up the learning, it is 
usually preferable to perform this update each time a new pattern is pro- 
cessed, choosing them in random order: this is known as non-batched or on- 
line back-propagation. Generally, we choose the on-line mode if the number 
of patterns is small {i.e. ~ 10^), and the batched one if it is large. 

It is clear that back-propagation seeks minima of the error function given 
in eq. ( ^.3.5|) , but it cannot ensure that it ends in a global one, since the 



procedure may get stucked in a local minimum. Several modifications have 
been proposed to improve the algorithm so as to avoid local minima and to 
accelerate its convergence. One of the most successful, simple and commonly 
used variants is the introduction of a momentum term to the updating rule, 
either in the batched or the on-line schemes. It consists in the substitution 
of eq. (|5XI0| ) by 



dujfj = -r]^j. + a6uj§{la.st), (5.3.14) 
Sef^ = -r/— ^ + (last) 

where the "last" means the values of the 5ujf^ and dOf^ used in the pre- 
vious updating of the weights and thresholds. The parameter a is called 
momentum of the learning, and it has to be a positive number smaller than 
1. 



5.4 Practical issues 
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5.4.1 Rules of thumb 

Number of layers 

The number of layers issue depends on the specific task one wants the net- 
works to perform, but a general statement is that no more than two hidden 
layers are needed, even though very many units might be needed in these 
layers. 

Any function, no matter how complex, can be represented by a multilayer 
neural network of no more than three layers; the inputs are fed through an 
input layer, a middle hidden layer, and an output layer. This is an important 
result in that it proves that whatever is done in four or more layers could 
also be done in three. However, we note that with only one hidden layer one 
may need a very large number of units in the hidden layer. Thus, it could be 
more useful to have two hidden layers and a smaller number of neurons on 
each. Notice also that, as the sigmoid is very close to a step function, if we 
have a neural network with two hidden layers and a small number of units 
it could be useful to adopt a linear activation function between the last two 
layers to have a smoother output. 



Number of hidden units 

The number of hidden units needed to approximate a given function JF is 
related to how many terms are needed in an expansion of J-' in the function 
g{). There are several techniques to determine the optimal number of units. 
We can reduce a large number units by the weight decay approach, where 
weights which are rarely updated are allowed to decay according to 

dm 

5ij = -V-Q^ (^^ij , (5.4.1) 

where e is the decay parameter, typically a very small number, O(10~^). This 



corresponds to adding an extra complexity term to the energy function [3S 



E^E + ^J^'^^j (5.4.2) 



imposing a "cost" for large weights. A more advanced complexity term |38 
is 

where the sum extends over all weights. For large \uJij\ the cost is A, whereas 
for small weights it is zero. The scale of weights is set by c^o- In this way the 
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cost reflects the number of weights, instead of the size, hence the network 
gets pruned to only contain weights that are reafly needed to represent the 
problem. One can also begin with a small number of units, and then increases 
it one by one by trial- and- error till a stability of the error function or of a 
given indicator is reached. 

Learning parameters 

If the activation (^j) of output node i is large, the optimal r) for weights 
connecting to that unit will be small. The natural thing is to rescale the 
input data such that {C,i) ~ C'(l) for sll i, in which case they will have 
approximately the same optimal r] (even the thresholds). This also simplifies 
for the case of the hidden units where (hi) ~ 0{1) for sigmoidal units, and 
the same learning rate can be used for all weight layers. 

The momentum term a controls the "averaging" of the updatings and it 
is closely connected to the learning rate. An increase of a means an increase 
of the "effective" learning rate. The optimum a depends on the updating 
procedure used. For the batched method a is very useful and should be a 
number close to unity (0.5 < a < 1). For the on-line updating, on the other 
hand, a is often (but not always) useless. 

Time dependence 

One can choose a learning parameter that varies with time. In this way 
after the region of the minimum is reached with a large 77, the learning rate 
is reduced as to avoid to get out of there and to refine the research of the 
minimum. Notice however that if a local minimum is reached in this way one 
can not get out of there. 

Choosing patterns 

If patterns are shown sequentially, there may result a bias, e.g. due to a 
regularity of some sequential input patterns in giving a positive variations of 
weights. This is the reason why it is a good rule to choose patterns randomly. 
It is also a good rule to show a pattern as an input at least 10^ times, in 
order to allow compensations of different variations. 

5.4.2 Generalization 

One of the major features of neural networks is their ability to generalize, 
i.e. to classify successfully patterns that have not been previously presented. 
Multilayer neural networks generalize by detecting features of the input pat- 
tern that have been learned to be significant, and so coded into the internal 
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ECDMS Data Points BCDMS Data Points ECDMS Data Points 



Figure 5.3. Different trainings for a fit of the non-singlet structure function F2—F2 
for a subset of BCDMS data points. 



units. Thus an unknown pattern is classified with others that share the same 
distinguishing features. This means that learning by example is a feasible 
proposition, since only a representative set of patterns have to be taught 
the network, and the generalization properties will allow similar inputs to 
be classified as well. It also means that noisy inputs will be classified, by 
virtue of their similarity with the pure input. It is generalization ability that 
allows multilayer neural networks to perform more successfully on real- world 
problems that other pattern recognition or expert systems. 

Generally, neural networks are good at interpolation, but not so good at 
extrapolation. They are able to detect the patterns that exist in the inputs 
they are given, and allow for intermediate states that have not been seen. 
However, inputs that are extensions of the range patterns are less well clas- 
sified, since there is little which compare them with. Put another way, given 
an unseen pattern that is an intermediate mixture of two previously taught 
patterns, the net will classify it as an example of the predominant pattern. If 
the pattern does not correspond to anything similar to what the neural net- 
work has seen before, then the classification will be much poorer. Notice also 
that the generalization performance is usually lower than its performance on 
the training set, although for very large data sets the performance can be 
approximately equal. For a feed-forward network with one hidden layer, the 
generalization error is of order 0{N^/Np) where is the number of weights 
and Np the number of training patterns . 



In Fig. p.3| we show, as an example, a plot with three different trainings of 
the structure function Ff — (details will be given in the next chapter). We 
have chosen the architecture (4,5,3,1) that will be used in the following, and 
we have trained the neural network on a very small arbitrary chosen subset 
of data, i.e. 30 experimental points from the BCDMS data (see later). 

In the first plot we can see a very smooth almost constant function which 
corresponds to a very short training. Here we see that the neural network 
strongly correlates data points, but since there has been no enough training 
it does not reproduce the behavior of data. The second plot corresponds to a 
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longer training. Here we see that the neural network has found an underlying 
law from data; also in this case it strongly correlates data points, but now 
it reproduce their behavior. The last plot shows the result of a very long 
training. Here the neural network goes on top of many data point loosing 
its generalization ability. This is what is called over-learning. If the number 
of weights of the neural network is equal to the number of data points wc 
can have over-learning for a large enough number of training epochs. If 
the number of weights is less than the number of points and the number of 
training epochs is large enough we can have only partial over-learning. 
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The DIS cross section is expressed in terms of the nucleon structure function 
F2, that carries informations about the inner structure of the nucleon. If we 
were able to solve QCD in the non-perturbative domain, we could calculate 
F2 with quark masses and Aqcd as the only inputs. Unfortunately this is 
not the case at present times. However, we need a more and more detailed 
knowledge of the structure of nucleons as they are essential ingredients for 



present and future hadron colliders |40|. Note that the unpolarized structure 



function is necessary also to determine the polarized structure function from 
the spin asymmetry in polarized deep inelastic scattering. Although the 
accuracy of such experiments is not yet very high, it could be anyway useful 
for future tasks to minimize the sources of errors. We have thus to extract as 
much precise information as possible from experiments. For this purpose we 
present here an alternative approach to extract F2 from data. Results will 
also be published in Ref. |4l[ . 

The Chapter is organized as follows. First we will describe the experimen- 
tal data used in our fit with details on correlated systematic uncertainties. 
Then, we will discuss techniques to calculate errors and correlations from 
the fitted F2, and we will show how we can minimize theoretical assumptions 
on the shape of the structure functions with the help of neural networks. A 
detailed description of the behavior of neural networks will be given. Finally, 
we will show our results for the non-singlet structure function F2 — F2, and 
preliminary results on the proton and the deuteron structure functions. 



6.1 Experimental data 

We have used experimental data given by the New Muon Collaboration 
(NMC) m and the BCDMS (Bologna-CERN-Dubna-Munich-Saclay) Col- 
laboration 1^ as they measure both the proton and the deuteron structure 
functions. We have not considered data given by the E665 Collaboration 
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pi| . E665 data cover a kinematic range with small x and Q^, that is almost 
entirely excluded in the analysis performed in Chapter 7. The inclusion of 
these data would have implied a further effort, and would not have been very 
significant to our purpose. A future analysis may consider these data as well. 
Note, however, that since some of these data cover a kinematic range similar 
to that of NMC, we could use them to test the prediction ability of our neural 
networks Q. Details on errors of both the experiments are given below. In 
Fig. |6.1| we show the kinematic region explored by the experiments. 



6.1.1 NMC 



We have used data and data tables given in [^]. NMC data consist of four 
data sets for the proton and the deuteron structure functions corresponding 
to beam energies of 90, 120, 200 and 280 GeV, covering the kinematic range 
0.002 <x< 0.60 and 0.5 GeV^ < < 75 GeVl The systematic errors are: 

• the incoming (E) and outgoing beam (E') energies, fully correlated 
between the proton and the deuteron data, but independent for the 
data taken at different beam energies; 

• the radiative correction (RC), fully correlated between all energies, but 
independent for the proton and the deuteron; 

• the acceptance (AC) and the reconstruction efficiency (RE) fully cor- 
related for all data sets; 

• the normalization uncertainty (2%), correlated between the proton and 
the deuteron data, but independent for data taken at different beam 
energies. 

The uncertainties due to acceptance range from 0.1 to 2.5% and reach at 
most 5% at large x and Q^. The uncertainty due to radiative corrections is 
highest at small x and large and is at most 2%. The uncertainty due to 
reconstruction efficiency is estimated to be 4% at most. The uncertainties due 
to the incoming and the scattered muon energies contribute to the systematic 
error by at most 2.5%. 

Finally note that each bin of data has larger statistical errors in the 
extremes and lower in the middle. In particular the worse points are those 
of larger Q^. The experimental reason for this effect is related to the details 
of the geometry of the detector and to the explored kinematic region. As 
one pushes the detection to some areas at the limit of the acceptance region 
many points are lost and the statistics goes down significantly. 

^Remember that neural networks are good in interpolation, but their are not in ex- 
trapolation. Thus, it is reasonable to test them on a kinematic range similar to the one 
they have been trained on. 
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Figure 6.1. NMC and BCDMS kinematic range. 



6.1.2 BCDMS 

We have used data given in [^] and data tables given in BCDMS 



data consist of four data sets for the proton structure function corresponding 
to beam energies of 100, 120, 200 and 280 GeV and three data sets for the 
deuteron structure function corresponding to beam energies of 120, 200 and 
280 GeV covering the kinematic range 0.06 < x < 0.80 and 7GeV^ < < 
280GeVl The systematic errors are: 

• Jn = 3%, the absolute normalization error totally correlated for all 
energies and the two targets; 

• /b < 0.15%, the calibration of the incident beam energy; 

• fs < 0.15%, the calibration of the spectrometer magnetic field; 

• fr < 1%, the calibration of the outgoing muon energy; 

• fd, inefficiencies of the detector (negligible). 

The first two systematic errors sources have large effects at large x and small 
Q^. Even if the experiments for the two targets where done at different 
times, the systematic errors are fully correlated for all targets and for all 
beam energies 



the calibration of the incoming beam energy E was realized in a magnet 
on the beam line, and it was dominated by a systematic uncertainty 
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due to scintillators that measured the muon beam. This position was 
more stable in time than the precision of measurements; 

• the calibration of the outgoing muon energy was reproducible with a 
relative value of about 0.02%. This remained the same in the four years 
of the experiment, and thus remained independent of the used target 
or the beam energy. On the other hand the absolute calibration had 
an uncertainty at a level of 0.1-0.2%; 

• the resolution of the spectrometer depended on the constituent mate- 
rial, which had not changed in the four years, thus same detector, same 
magnet and same positions. It could have depended on the hadronic 
production absorbed in the magnet, as these particles perturbed the 
recognition of the scattered muon. However, the hadronic productions 
from proton and deuteron are so similar that this effect is negligible. 

BCDMS data have a further source of uncertainty due to the relative 
normalization of data between different beam energies and different targets. 
Specifically, there is a 2% cross section normalization of data taken with 
different targets, a 1% relative cross section normalization of data taken at 
different beam energies for the proton, a 1% relative cross section normaliza- 
tion between data taken at 120 GeV and 200 GeV and 1.5% between data 
taken at 200 GeV and 280 GeV for the deuteron. 



6.2 Fitting procedure 



We consider the case where we have M measurements of a nucleon structure 
function F2. The central problem is to determine F2 based on observations 



F2^\ . . . , F2"' ■ Specifically, we can introduce a hypothesis for the structure 
function F2 which depends on unknown parameters 6 = {61, . . . ,6m)- The 
goal is then to estimate parameters by comparing the hypothesis with exper- 
imental data. 

We first discuss as an example a functional parametrization of F2 (see 
J7| and references therein). A QCD inspired parametrization of F2 can be 



,(M) 



constructed by observing that the evolution equations (p.3.1|) introduce a 
logarithmic dependence on Q^, while a polynomial function could fit the de- 
pendence on X. Moreover, from the parton model we have a further constraint 
on the X behavior for x = 1. We can then take 



F2{x,Q^)=x''^ f{x,Q' 



where 



f{x,Q')=A{x) 



logQVA^ 



2/x2lB{x) r 



logQ2/A2 



1 + 



(6.2.1) 



(6.2.2) 
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and 

A{x) = (1 - x)"^ [as + 04 (1 - x) + a5 (1 - x)^ 
+ ag (1 - x)^ + 07 (1 - x)^] , 

B{x) = 61 + 62X + — (6.2.3) 

X + O4 

C{x) = Ci X + C2 X^ + C3 X^ + C4 x^ . 

The small x behavior of the function is described by the ai parameter, while 
the 02 term carries most of the information on the large x behavior. The 
reference scale is fixed arbitrarily at Qq = 20GeV^, while the value A = 
0.250 GeV is extracted from as measurements at different The last 



term of eq. (|6.2.2| ) takes account of higher twist effects (see the next Chapter 
for details). 

The total number of parameters in eq. (|6.2.1| ) is 15, and they can be 



estimated by minimizing the ^ given set of experimental data. Once 

the parameters are known we can evaluate any quantity that depends on F2, 
e.g. the asymmetry Ai or a Mellin moment. Since parameters are themselves 
stochastic variables, they are determined with an error and with a correlation 
matrix. In this way a value of the structure function is given with an error 
that is the combination of the parameter errors. Note that because of the 
non-linearity of F2 some linearization is necessary to determine the structure 
function error, and this may be a source of uncertainty. With this approach 
it may be non-trivial to obtain the errors on two given data points and 
especially their correlations given by 

cov[F2(xi, Ql), F2(x2, Ql)] = (6.2.4) 
dF2{x,Q^)dF2ix,Q' 



l,J=l -I 



cov[^^j, 6j\ 

(a;i,Q2),(x2,Q2) 



Even more complications arise if we would like to calculate functions of F2, as 
for example a Mellin moment, while it would be a very hard task to evaluate 
the correlation between two Mellin moments. A possible way to overcome this 
problem was given in |^^. Indeed, since F2 is provided with an "estimated 



error band" one might hope to get a qualitative idea of the error by taking 
the integrals of the upper and lower curves of the band as estimates of the 
error on a Mellin moment. This procedure is however meaningless. In fact, 
if we use the fitted F2 to extract, as an example, a^, we note that the error 
on as could be made arbitrarily small by increasing the number of values 
of at which the moment is evaluated (even within a fixed range in Q'^) 
|]28[| . This apparently paradoxical result is of course due to the fact that the 
procedure neglects correlations between the values of the moment extracted 
from the fit at two different scales, which tend to one as the scales get closer. 
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The Monte Carlo method is an alternative approach for calculating er- 
rors and correlations by using sequences of random numbers. A sequence 
of Monte Carlo generated values of F2 may be used to evaluate estimators 
for errors and correlations of the values of the parameters of F2. Techniques 
for constructing estimators are discussed in Appendix B.3. An important 
feature of properly constructed estimators is that their statistical accuracy 
improves as the number of values N in the data sample increases. A question 
which we will address is how large N must be to achieve an accuracy at the 
percent level on estimators. 

A functional parametrization as the one given in eq. (|6.2.1| ) introduces an 
uncertainty due to the imposed dependence on x and Q^. However, nobody 
knows which is the real behavior of F2, and any assumption on its functional 
form may be a source of uncertainty, whose exact size is very hard to assess. 
For this purpose we will consider a fit with a neural network. Indeed, a 
neural network with a given architecture can describe a structure function as 
well as, say, a demographic distribution having very different behaviors: the 
difference only depends on the input and the output data which it is trained 
with. In this case however we will have a larger number of parameters than 
with the functional parametrization. The number of parameters included in 
a fit corresponds more or less at the number of terms included in a Taylor 
expansion of F2. As nobody knows the exact expression of F2, nobody also 
knows how many terms we have to include in its Taylor expansion or the 
exact number of parameter that we need to fit it. The space of parameters 
is an infinite dimensional space, and the arbitrary choice of a fixed number 
of parameters corresponds to an arbitrary reduction of this space without 
assessing the uncertainty with which this reduction is done. As we have seen 
in Section 5.4.1 the number of neurons, and then of the parameters, in a 
neural network is chosen only by looking at the stability of the error func- 
tion without making any theoretical assumption on their number. We also 
observe that once the stability of a neural network is reached, information 
is maintained even in the case a neuron dies. In this way a neural networks 
guarantees a more robust and less arbitrary parametrization of a structure 
function. The only request is to determine the most stable and economic 
architecture for the problem at hand. Thus, a neural network fit of F2 will 
avoid both theoretical assumptions on the functional behavior of the struc- 
ture function and an arbitrary choice of the number of parameters used in 
the fit. 

Specifically, we will proceed as follows. First, we will generate N repli- 
cas of artificial data with the given experimental statistical and systematic 
correlated errors starting from the original ones. Then, we will perform a 
neural network fit of each replica. Finally, we will take the average over the 
number of neural networks, reproducing central values, errors and correla- 
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tions of the original data with an uncertainty that depends on the number of 
generated rephcas N . In this way errors and correlations between two quan- 
tities depending on the structure functions {e.g. both two values of the Ai 
asymmetry or two Mellin moments) can be calculated keeping under control 
all the theoretical biases. 



6.3 Generation of artificial data 

Artificial data for the NMC experiment are generated with 

= (l + rya^v) [i^/'^'"^ (6.3.1) 

, riE + r2E' + r^AC + riRC + r^RE 
+ * + '"^ 

where the systematic errors have been discussed in the previous section and rj 
are random Gaussian numbers. In order to reproduce the correct correlations 
we choose rs and equal for different energies and target, r4 equal for data 
of the same target, ri, r2 and r-j equal for different targets. Gaussian random 
numbers are generated with the gasdev routine given in |^ and reported in 
Appendix C. 



Artificial data for the BCDMS experiment are generated with 



Fr*) = (l+r5a^)/rT7^/rT77^[Fr^) (6.3.2 

, ri fb + r2fs+ rs fr „(exp) , 

+ ^ K '^' + nastat 

where the systematic errors have been discussed in the previous section, 
(Ttv is the absolute normalization, a^^ is the relative normalization between 
different targets, ctatj^ is the relative normalization between different beam 
energies, are random Gaussian numbers as above. In particular, ri,r2,rs 
and rs are equal for different energies and different targets, rg is the same for 
data of a given target, rj is the same for data of a given energy. 

A relative normalization uncertainty cr^r,^ between two measurements mi 
and m2 is taken into account by multiplying one measure by a/1 + a^^ r and 
the other by 1/ y/1 + r, where r is a random Gaussian number. In this 
way the product mim2 is equal to one, while their ratio gives 1 + Thus, 
when the normalization of mi increases, that of m2 decreases. The same 
result is obtained by multiplying the two measurements by -y/l + cta?^ r and 
a/1 — (Tat^ r respectively. If we generalize to many measurements having a 
relative normalization uncertainty with each other, we obtain as an averaged 



effect an additional uncertainty on the measurements, see eq. ( |6.3.2|) . The 
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Figure 6.2. (Fi 
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vs Fi and r^"'*) 



vs. r 



(es^p) iiiQ different structure functions 



with Nrep =10, 100 and 1000. 



error introduced by the relative normalization uncertainties is v^r-FoxT ~ 



We now address the problem of understanding how the central values, er- 
rors, and point-to-point correlations computed from the artificially generated 
data compare to the corresponding input experimental values, as a function 
of the number of generated replicas. A qualitative answer can be obtained 
from Fig. |6.2| , where we show the scatter plots for mean values and correla- 
tions for the proton, the deuteron and the non-singlet structure functions. 
The reason why we take into account also the non-singlet structure function 
will be explained in the next section. 

A more detailed description can be obtained by defining the following 
quantities: 

• average over the number of replicas for each experimental point i {i.e. a 
pair of values of x and Q^) 



1 ~ \<^N^- 




(6.3.3) 



The error on (Fj) 



rep 



is given by 




(6.3.4) 
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Tabic 6.1. Comparison between experimental and generated artificial data for 
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in a similar way. These estimators indicate how close the averages over 
generated data are to the experimental values. Specifically, they will 
indicate if and how the slopes of the scatter plots in Fig. |6.2| differ from 
one; 



averaged correlation: 
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Similarly we define r[a], r[r] and r[cov]. This estimator indicates which 
is the spread of data around the art. vs. exp. line in the scatter plots 
of Fig. O. 
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Table 6.3. Comparison between experimental and generated artificial data for the 
non-singlet structure function. Experimental data yield: (a^^^j = 0.0114, 
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We expect e.g. the variance on central values to scale as 1/Nrep, while the 
variance on the errors should scale as 1/ -y/ N^-ep (see Appendix B.3). From 



Tables ^]T|, ^]2| and |6]^ we see that this is approximately the case. Note 
that the exact scaling behavior is observed for Nrep ~ 10^. We see also that 
one needs about 100 artificial data to get an accuracy at the percent level 
on central values, and about 1000 to get the same accuracy on errors and 
correlations. 



6.4 Building and Training Neural Networks 

We decided to fit separately F^, F^ and F| — F^. Fitting the difference 
F2 ~ F2 is safer than taking the difference of F2 and F2 after they have 
been fitted separately. Indeed, if we want a precision on F|' — F2 of 10~^, we 
must have a precision of at least 10-'^ on F|' and F2 separately. However, 
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since the precision on F|' and is 10~^, in this way we would have a very 
poor precision, 10~^, on F2 — -F^. The fit of F2 — F2 is also delicate, as the 
non-singlet combination is very close to zero on a wide range of x, and errors 
are summed in quadrature; thus, lots of care must be taken to avoid fitting 
noise. The problem does not arise for the singlet combination + -F^*. In 
the following we outline the common features of the different fits, and then 
we will focus on details for each case. 

We have constructed a neural network with the architecture (4,5,3,1), and 
we have used as inputs x,Q^,\ogx and logQ^. The choice of taking log a; and 
log as inputs does not introduce a theoretical bias, as the neural networks 
could "decide" to ignore these inputs if they are useless. One could even 
take as an input, say, a temperature, and if this variable is useless for fitting 
the given function the neural networks would have zero weights on the paths 
corresponding to this variable. 

We have used sigmoid activation functions between the first three layers, 
while for the last layer we have chosen a linear activation function. As ex- 
plained in the previous Chapter, this guarantees a smoother behavior of the 
neural network. 

The number of units per layer has been chosen by trial-and-error, i.e. by 
adding a unit to a layer and looking for the stability of the output. In 
particular, we looked for the stability of correlation between central values 
by asking that it was more than 80%. 

We have adopted the on-line training since the number of data is rea- 
sonably small. As we pointed out in the previous Chapter, it is worth pre- 
processing data in order to avoid fitting biases. In particular, we should not 
take data in the sequential given order, but it is better to pick them ran- 
domly. For this purpose we have used the idirty random number generator 
of Numerical Recipes |^ reported in Appendix C. Such a generator has a 
periodicity that may cause oscillations in the sampling of data. However, we 
have checked that oscillations due to the periodicity of the random number 
generator do not sensibly affect the fit. 

The only theoretical assumption on the shape of F2 has been the kine- 
matic request F2{x = 1, Q^) = 0. This has been done by artificially adding 
10 points at a; = 1 with equally spaced values of Q^. The choice of the error 
on these points is very delicate, because if it is too small the neural networks 
would spend a lot of their training time in learning these points. One would 
obtain a very precise fit of the kinematical constraint F2{x = 1, Q"^) = 0, and 
a worse fit of the experimental data. We have thus taken an error of the 
same order of the smallest experimental error. In particular, we have taken 
10~^ for F2 and where data are very different from zero within errors. 
For F2 — F2 since the structure function is very close to zero, we have asked 
a higher precision on these points by setting the error equal to a/210~^. 
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As explained in Appendix B.4, the quantity we want to minimize in a fit 

is the Covariance Matrix Estimator (CME) given by 

p m 

^2^J2 (^^(^') - ^nV.j\o,i^n - z^) , (6.4.1) 

as in this way not only statistical errors, but also correlated systematic un- 
certainties are taken into account. The updating rules given in eq. (|5.3.10|) 



that govern the neural network learning, show that the variations on a sin- 
gle data point very weakly affect a given weight. Thus, to have a stronger 
effect we can switch to the batched training mode in which all the weights 
are updated after all patterns have been presented to enhance the effect of 
variations. However, the energy definition given in eq. ( |6.4.1| ) makes a sum 
over all patterns appears on the r.h.s. of the updating rule eq. ( p.3.10| ) and its 



evaluation deeply affects the rapidity of the training. We can then minimize 
the simpler case where the energy is defined as the Simplest Estimator 
(SCE), 



If systematic errors are not predominant with respect to the statistical ones, 
eq. ( |6.4.2| ) can be a good estimator of eq. ( |6.4.1| ) (see Appendix B.4). We will 



not take care of correlations when we fit each neural network to a replica of 
artificial data, since they are produced anyway by the Monte Carlo. In order 
to increase the efficiency of the training we can perform it in two cycles. First 
we minimize the error function eq. (|5.3.5|) , and then we minimize eq. ( |6.4.2 ). 



This procedure corresponds to starting with a coarse search for the minimum, 
and then refining it once its neighborhood has been located. In the first 
example, for the first cycle we have used t] = 0.004 and a number of epochs 
of 2 X 10^. For the second cycle we have taken rj = 4 x 10~® and a number 
of epochs of 4 X 10^. For every cycle the value of the momentum term was 
a = 0.9. As the number of epochs is larger than 6 x 10^, every pattern may 
be seen at least 10^ times. Henceforth we will label a training by a shorthand 
for the number of epochs of the second cycle; in the present case it is 4M. 

As we have seen in the previous chapter, we expect neural networks to 
produce lower errors because they interpolate smoothly. Actually we expect 
them to reduce more the statistical errors than the systematic ones. An 
example is the problem of interpolating data randomly distributed around a 
horizontal line. If we fit these data with, say, a parabola, the result of the 
fit will be a parabola and not a horizontal line. If we instead take a set of 
neural networks, that do not have a definite functional behavior, we expect 
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that all distributions will have fits very similar and close to the horizontal 
line. Average over fits will give smaller errors to each point. If on top of 
the statistical error a global shift is added, then the fits will follow this shift. 
It is clear that infinite training will make the fit to go on top of each point 
and then all errors would be reproduced. But then the neural network is no 
longer assuming continuity or capacity of generalization. For that purpose a 
cubic spline fit would do the job. 

The question which now we address is how the central values, errors, and 
point-to-point correlations computed from the neural networks compare to 
the corresponding data (experimental and artificial), both as a function of 
the number of replicas and of the length of the training, and both when 
comparing neural networks one by one, or their average. We then define the 
following quantities {Nrep = Nnet)' 

• we define an averaged over the number of points for practical reasons. 
As the number of points is of order 500 and the number of parameters 
is of order 50, this quantity differs for a 10% from the Xd.o.f.- 
last section we will show results also for Xd.o.f.- ^^^^ define: 

-l Nnet / rp(k,net) j-,{k,art) \ ^ 
^in.-ar^ = ^ ^2 i ^ ^ , (6-4.3) 



net 



k=l 



Nnet / T-i(fe,net) Tp{exp) ^ 



^^net / -p^yn^ll^l) jp\ 



and 



{net)\ _ /p{art)\ 



2{net-art) _ -*- ( \ ' /net \ ' / net \ (aA K\ 

^^^^ ~ N~~ ^ > ^- ' ^ ' 



^points 



2 (net—exp) 
XSCE 



■^points 




2 



i=l 

where CTj is the experimental statistical error; Xcme defined in an 
analogous way with the inverse of the covariance matrix. In the follow- 
ing we will use always Xsce^ unless otherwise explicitly stated. The 
reason why we have given two different definition of will become 
clear in the following. Note that the average of T^2(net-art,ea:p) ^^^^ 
number of points gives a result different from -i^2(net-art,ea:p)^ ^ ^ ^j^g 
averages are not commutative. Specifically, ^"^(^^^-"-^^'^^p^ indicate how 
each neural network fits the corresponding replica, while -1^2 (net-ar-t,ea;p) 
refiect the quality of the fit of experimental data or the average over 
the replicas as obtained from the average over the neural networks; 
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averaged correlation: 

p|^(net-art)j _ 



1 



(j{net) (j{art) 



jp{net)\^ / jp{art)\ 

net \ / net / points 



p{net)\ \ //p{art}\ 

I net! points W I net I points 



(6.4.7) 

Similarly we define ^[o"*^"^*"'^''*'*], ^[r*^"'^*"'^'"*^] and rlcov^"^*""*"*)]. 



• variance and percentage error are defined as in the previous section 
with obvious replacements 

As an example we first consider a training of 1000 neural networks on 
non-singlet data with the learning parameters discussed above. Results are 
collected in Table |6]^. First, we observe that one can reach a -i^2{net-exp) ^ 
^{net-art) ^ ^ with an 80% correlation between neural networks and data 
(experimental or artificially generated), but the mean percentage error on the 
errors remains about 90%. The errors and correlations are only correlated to 
about 40%. Furthermore neural networks errors are systematically lower and 
correlations systematically higher by about a factor 4-5 than the experimental 
ones. Neural networks are producing a statistically good fit, but they are 
smoothing. 

In order to understand whether the smoothing reproduces an underlying 
law or not, we will describe a toy model. Let us first consider a measured 
value vni of F2 where i represents a pair of values (x, Q^); we have 

mi=ti + aiSi (6.4.8) 

where Si is a univariate Gaussian number, tj is the true value of F2, and CTj 
its error. The k^^ replica of generated data gives 

5ff = mi + rfc (Ti = tj + {si + rk)(Ji , (6.4.9) 

where is a univariate Gaussian random number. If the neural networks 
succeed in finding the true value tj with an error at smaller than the experi- 
mental one, for the k*^ neural network we have 

nf^=U + r'^ai. (6.4.10) 

The variance between the neural networks values and the experimental data 
or the replicas is given by 



=slal + a^. (6.4.11) 

rep 

- E {9^^ - nf^y = (1 + s^) + , (6.4.12) 



^rep 
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Table 6.4. Comparison between neural networks and experimental data for the non- 
singlet structure function with a 4M training. 



If we now take the divided by the number of points, we get 
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(6.4.13) 



(6.4.14) 



thus, 7^ ^ i if (o-) < (cr). From Table |6]| we have that {a) /{a) ^ 0.4, and 
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7?. ~ 0.54. As a consequence 

/•Y2(net-exp)\ ^ 2 /;^2 (net-art) \ _ (6.4.15) 

\ / points \ / points 

If we first take the average over the rephcas and then the variance between 
the neural networks values and the replicas ones or the experimental data, 
we find 

{nii - {ni)repf = Cr^ S- , i{gi)rep " {ni)repY = CT^ S- , (6.4.16) 

and 

^2{net-exp) _ ^{net-exp) _ ^ (6.4.17) 



Table |6.4| beautifully shows this behavior. 

To understand why the error of neural networks can be smaller than the 
experimental one, and how we can recover this loss of information, as a toy 
model let us assume that the experimental data on F2 satisfy an underlying 
linear law as a function of, say, the Bjorken variable x, 

mi = Cxi + D. (6.4.18) 

Given M + 2 data points, two points will be sufficient to determine the linear 
parameters C and whose error is (cr). All the other points will only play 
the role of reducing the error on C and D by a factor since they are all 
measurements of the same quantities C and D. If we take (a) = -^{o'), 
from eq. ( |6.4.14|) we have 



M + IM M+11 

If M = 0, the neural networks obviously correlate a point only with itself, 
giving TZ = 1 and an undefined uncertainty; if M — > 00, the neural networks 
correlate all points giving a null error and TZ = 1/2. 

Obviously F2 is neither a function of x only nor is it linear. Anyway the 
neural networks exhibit a finite correlation length ~ -4= and reduce the error 
on each point. Indeed, as {a) / (a) ~ 0.4, we have that the neural networks 
correlate at about 10 points lowering their error at about by a factor 1/3. 
From Fig. |6.3| we see the way the neural networks behave. We have that 
BCDMS points coming from data sets with different beam energies, but 
with the same values of x and are seen as the same point. On the other 
hand we see also that points of NMC and BCMDS that should have zero 
correlation between each other, are strongly correlated with all the points 
with similar values of x and independently of the experiment. 
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Figure 6.3. Neural network correlation length for x and . 



6.4.1 Non-singlet 



We now turn to specific details on the fit of the non-singlet structure function. 
From the short training example given in Table |6.4| we see that (net-exp) 
for NMC is excellent (~ 0.6), but correlations and errors are very poorly 
reproduced, while (net-exp) BCDMS is good (~ 1.4), but not as good as 
for NMC, and correlations and errors are well reproduced. In order to better 
explain the neural networks behavior we will study the relative impact of 
the two experiments on the training of the neural networks. Here we will 
consider the training of only one neural network on the experimental data. 

First we find that the length of the training on NMC does not improve 
errors and correlations for NMC. Then increasing the training for BCDMS 
one would expect to improve -1^2 (net-exp) BCDMS while deteriorating 
^2(ne4-exp) ^^Q^ Howcvcr, whllc BCDMS does improve, NMC does not 
deteriorate. If we now fit the neural networks to BCDMS only plus the 
points at x = 1, we find that -1^2 (net-exp) NMC is just as good as before. 
We conclude that the NMC points have essentially no impact on these fits: 
the corresponding central values are "predicted" by the neural networks fit- 
ted to BCDMS. Thus neural networks errors and correlations for NMC have 
nothing to do with their experimental values. The neural networks trained 
on both experiments only exploit the BCDMS data, while the NMC data 
have little or no impact on them. 

We then address the problem of reducing -1^2 (net-exp) BCDMS where 
the mediocre quality of the BCDMS fit is mostly due to the large x region 
that is not very well reproduced. Notice that if we increase the error on the 
points at X = 1, this does not help in making the fit of the remaining points 
easier {i.e. the ^2 (net-exp) ^j^g remaining points remains more or less the 
same). If we now increase the training, say at 40M, and keep fitting to 
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Figure 6.4. Training with BCD MS 
data. 
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Figure 6.5. Training with NMC data. 



BCDMS only, the ability of the neural networks to predict NMC deteriorates 
very fast as -1^2 (nei-exp) BCMDS improves. Roughly, when BCDMS is 

down to ;^2(net-ezp) ^ already at ;^2{net-exp) ^ 2.6. 

However, it turns out that we could get an equally good fit for the two 
experiments by not excluding the NMC data completely, but also not showing 



them the same number of times as the BCDMS points (Fig. |6]^). We find 
that a rather long training (of order 150M) is required in order to get a 
good -1^2 (net-exp) BCDMS. Wc obtain a good equilibrium between the two 
experiments by showing the NMC data 10% of times. Specifically, with 180M 
training cycles, we get 

2{net—exp) 2{net—exp) 2{net—exp) p. qq . on\ 

XbCDMS ~ XnMC ~ XbCDMS+NMC ~ U-oo • [0.4:. ZO) 

In order to further understand the relative impact of the two experi- 
ments we can show some more fits. Specifically, to the fit training the neural 
network only on BCDMS data (Fig. |6.4|) , we add a fit training the neural 



network to NMC data only (plus the 10 points at a; = 1), see Fig. |675| . The 
interesting result is that in both cases the neural network trained on one data 
set successfully predicts the other data set. The T^2(net-exp) ^^^e predicted 
data set is in both cases about 1.5 after a long enough training. In particular 
the very bad behavior of the predicted NMC as the ;^2(net-exp) BCDMS 
improves is a feature of short-medium training, 40M, as the training goes 
on also the predicted NMC improves again. Thus, the neural network can 
use either of the two experiments to predict the other one. Of course, if the 
experiment with smaller errors is used to train the neural network, the other 
one is predicted also with smaller error. In order to show the relevance of 
the fitting procedure, where the NMC data are only shown 10% of times, 
we can now consider the training where data are given equal weight, but 
letting it run for a very long training, lOOOM training epochs, see Fig. 
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Figure 6.6. Training with BCDMS Figure 6.7. Long training with NMC 
data and 10% of times with NMC data and BCDMS data with equal weight 



We see that the BCDMS does improve constantly, so that also in 

this case the -1^2 (net-exp) ^^^e two experiments would eventually intersect 
for long enough training. However, this way of training is both inefficient (it 
takes very long to find the intersection) and also subject to the criticism that 
at the intersection point the neural network is almost certainly over-learning 
(presumably at the intersection ~ 0.6). So training to NMC only 10% of 
times is a trick which helps both in making the training faster, and in ob- 
taining the same -Y^C^et-exp) ^^iq two experiments at a value where there 
is no over- learning yet. 

Finally, we look again at the good training, i.e. 180M and 10% of times 
on NMC. As we have noted already the -1^2 (net-exp) ^j-^^ average over the 
neural networks is good and essentially equal for both experiments. The 
errors and correlations are well reproduced for BCDMS, but very poorly for 
NMC (no correlation between neural networks average and data and so on). 
The reason why the errors and correlations are poorly reproduced for NMC 
is that that the local information provided by each NMC data point is very 
weak, and individual NMC points carry little information on the shape of F2. 
In other words, a few NMC points are sufficient to train the neural network 
and the remaining ones do not provide significant extra information. As a 
consequence, the values of error and correlations for each individual point (or 
pair of points) have little or no impact on the neural network. This fact can 
be seen in the fit where we use all BCDMS data, but only 20 NMC points 
(7% of NMC points arbitrarily chosen among those where the systematics is 
less than the statistical error). Fig. |6.8| . This fit is as good as the fit where 
all NMC data are kept. 
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Figure 6.8. Training with BCDMS data and 20 points of NMC 



6.4.2 Proton and Deuteron 

Fits of proton and deuteron structure functions are more complicate. Here we 



will outline only some interesting features. Tables |6]^ and |6.6| show the results 
for a lOM training of 100 neural networks for the two structure functions. 
We have used the same learning parameters for the non-singlet structure 
function fit, while the number of epochs on the first cycle has been taken 
equal to 4.6 x 10^. The data sets of the two experiments have been shown 
an equal number of times. 

First we observe that the behavior of neural networks is approximately 
the same for and We note also that there are differences from the 
fit of F2 — F^. Indeed, the ratio TZ for the two experiments is of order 
1. Specifically, we have that TZnmc = 0.77 and TZbcdms = 1-51 for the 
proton and TZnmc = 0.84 and TZbcdms = 1-05 for the deuteron. This 
means that while for NMC the neural networks perform a smoothing, for 
BCDMS they do not. In particular, we find that the neural network errors for 
NMC are smaller than the experimental ones, while correlations are larger. 
However, their ratio is smaller than in the non-singlet case. Thus, neural 
networks are smoothing, but less than in the non-singlet case. For BCDMS 
neural network errors and correlations are very close to the experimental ones 
showing that no additional correlation has been added. BCDMS data are all 
necessary to constraint the neural network behavior. Some other features 
can be summarized as follows: 



• when we increase the training for both the proton and the deuteron, 
Xbcdms always decreases (it can become less than 1), while Xnmc^ 
although initially decreases faster, saturates at a value of about 1.3 for 
the proton and 1.2 for the deuteron; 
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for both the proton and the deuteron Xnmc already minimized by the 
first cycle; it very little decreases with the second cycle, and sometimes 
if the learning rate is too large (order 10~^) it increases. The situation 
is the same if we fit only NMC excluding BCDMS data points; 

systematic correlated errors do not play a significant role, as Xcme is 
not much less than XscEi 

correlations for central values, errors and correlations, are better then 
in the non-singlet case, although here we have only 100 neural networks; 
the scatter plots in Figs. |6.9| and |6.10| show that for both the proton and 



the deuteron we have a good agreement on central values. However, 
the statistical experimental errors are very small compared with the 
differences between the experimental values and the fitted ones, and 
this is why the is still bad although all the other estimators are 
good. 



6.5 Results 

We consider the case of neural networks trained over 1000 replicas of the ex- 
perimental data. The learning parameters are the same used in the previous 
examples, i.e. t] = 0.004 for the first cycle, rj = 4 x 10~^ for the second, the 
momentum term a = 0.9. We have trained the neural networks 4.6 x 10^ 
times for the first cycle, 180 x 10^ for the second one. We have shown the 
NMC data 10% of times. As we have discussed in the previous Section, 
this is only a trick to improve the training time. Results for the non-singlet 



structure function are collected in Table |6?7|. We observe that: 



Xcme — ^ each experiment, i. e. our fit differs from data for less than 
one standard deviation, even properly keeping into account correlations 
between systematic uncertainties. The same result occurs if we consider 
the Xd.o.f. whose values are collected in Table |6l8| . As we have used 552 



experimental points and 47 parameters (38 weights and 9 thresholds), 
the number of degrees of freedom is 505; 

central values are correlated each other at the level of 81%; the fit over 
BCDMS data is more correlated (95%) than that of NMC data (69%), 
according to what we have shown in the previous section; 

the loss of information due to the smaller error given by neural networks 
is regained in the increasing of correlations. As we have shown this 
effect is due to the ability of neural networks of finding an underlying 
law, 7^ = 0.58; 
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• the mean value of the covariance between data points is approximately 
the same for the fit over NMC and NMC+BCDMS data, while it is 
exactly the same for BCDMS data. 

Results for proton and deuteron structure functions are forthcoming. 
Once they are ready, they will be available through a package to be em- 
bedded in a code for practical purposes. A web site will also be realized for 
an on-line distribution of the fitted data. 
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Figure 6.9. Comparison between neural networks and experimental data 
for the proton structure function with a lOM training. 
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Table 6.5. Comparison between neural networks and experimental data for 
the proton structure function with a lOM training. 
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Figure 6.10. Comparison between neural networks and experimental data 
for the deuteron structure function with a lOM training. 
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Table 6.6. Comparison between neural networks and experimental data for 
the deuteron structure function with a lOM training. 
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Table 6.7. Comparison between neural networks and experimental data for the non- 
singlet structure function with a 180M training. 



NMC+BCDMS NMC BCDMS 



SCE 0.91 0.96 1.04 

CME (186 0.93 0.97 

Table 6.8. Xd o / f^'^ SCE and CME with 505 degrees of freedom for all the exper- 
imental points, and for each experiment. 
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7 



Determination of ag with 
truncated moments 



The strong coupling constant is the only free parameter in the QCD 
Lagrangian, and more and more accuracy is needed in its determination. As 
shown in Fig. 7A. there are several experimental processes from which we can 
extract a^. Deep inelastic scattering is one of the prime ways of determining 
the strong coupling constant a^. 

Here we will show how the technique of truncated moments can be used 
for a determination of the strong coupling constant by keeping under control 
theoretical biases. We will take the fit of the non-singlet structure function 
given in the previous Chapter to obtain the truncated moments of the non- 
singlet parton distribution at the initial and the final scale. Then the value 



of as will be extracted from the evolution of truncated moments pT 



7.1 The DIS phenomenology 

In this Section we will complete the phenomenology of Deep Inelastic Scat- 
tering processes introduced in Chapter |^. Specifically, we will introduce and 
describe possible effects of theoretical uncertainty, and the techniques we 
adopt to keep them under control. 



7.1.1 Target Mass Corrections 

Within the operator product expansion [^] the structure functions are given 
by the sum of contributions coming from operators of different twists. For un- 
polarized lepton scattering only even twists larger or equal to two contribute. 
Thus keeping the twist-4 contributions into account, we have 

F2{x,Q) = F^ {x,Q) + ——, (7.1.1) 
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Table 7.1. Present status of the determination of ag as given in Ref. 



where Fr 



LT,TMC 
2,L 



gives the leading twist (LT) including target mass corrections, 



as calculated in Ref. [p3| : 



12 



4 4 /•! 

m X ' 



r5 



fdi' f di"F,{i'\Q'), (7.1.2) 
Ji Ji> 



where 



2x 



1 + v^' 




4x2 



(7.1.3) 



and F2 is the structure function of twist 2 eq. ( p.2.11| ). This approach allows 
us to separate pure kinematical corrections, so that the functions H{x) cor- 
respond to "genuine" or "dynamical" contribution of the twist 4 operators. 

There is a well-known difficulty in eqs. (|7.1.2| ) at x = 1 (see e.g. discussion 
and references in IQ): in fact, when x = 1 the structure functions should 
vanish for kinematical reasons, while the r.h.s. of eq. ( [7.1.2|) is clearly nonzero, 
since ^{x = 1) < 1. Indeed, in the large x region dynamical higher twist 
corrections become important and cannot be neglected any more. This is 
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because the twist 2 + 2k contribution to the ra*'^ moment of a generic structure 
function has the form 

BUQ')I^^)\ (7.1.4) 

where A is a mass scale of the order of a few hundreds MeV, and the coeffi- 
cients BkniQ'^) have no power dependence on n, k or Q^. The crucial feature 
of eq. ( [7.1.4| ) is the presence of a factor n^, which arises because there are 
at least n twist 2 + 2k operators of a given dimension for each leading twist 
operator of the same dimension. One can prove that the behavior of struc- 
ture functions in the x ~ 1 region is governed by moments n = 0{Q'^ /m?)\ 
in fact, when x = 1 and w? jQ^ <^ 1 we have 

2 

m 

on the other hand, if we assume a (1—^)"^ behavior for the structure function, 
with 02 of order 1, its n*'' moment receives the dominant contribution from 
the region 



(7.1.5) 



4 ~ ~ 1 . (7.1.6) 

a2 + n n 

Comparing eqs. ( [7.1.5|) and (|7.1.6| ), we obtain that the relevant moments for 
the X = 1 region are of order 

n = — (7.1.7) 

as announced. Inserting this in eq. ( [7.1.4] ), one immediately realizes that the 
contribution of dynamical higher twists is no longer suppressed by inverse 
powers of when x is close to 1, and we cannot expect eqs. (|7.1.1|) and 



( |7.1.2|) , to hold in this region. A solution to this problem is that of expan- 
ding the result in powers of rn? jQ^ up to any finite order. In this way, the 
dangerous contribution of terms with large powers of rn? /Q"^ is not included. 
The expansion remains reliable even when is as low as m? , provided x is 
not too large; in fact, powers of rn? /Q"^ always appear multiplied by an equal 
power of x^. The expanded result of course cannot be reliable at x ~ 1, but 
this would not be the case even without expanding in m^/Q^, since we are 
not including the contributions of eq. (|7.1.4|) , which are important in this 
region. 

There are several attempts to give a theoretical estimation of dynamical 
HT (see e.g. |^5[), as well as to extract HT contribution from data taken at 
low (see e.g. |^B|). Meanwhile, due to the fast fall of the HT contribution 
with Q^, it is significant for < 10 GeV^ only. Thus, if we choose a kine- 
matic range given by xo = 0.01 and > 20 GeV^ target mass corrections 
and dynamical HT will be numerically negligible, even in the large-x limit. 
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7.1.2 Renormalization and factorization scales 



As we have seen in Sect. p.2| F2 is given by the convolution of coefficient 
functions and parton distributions which both depend on a factorization 
scale fip. A dependence on the renormalization scale is introduced by 
NLO QCD approximation of and by the NLO splitting functions, when 
we consider the scale evolution of parton distributions with the Altarelli- 
Parisi equations. Since we use the truncated perturbative series, the results 
depend on the factorization scale and the renormalization scale 

Only a limited set of Mellin moments of the NNLO Altarelli-Parisi split- 
ting functions , as well as some asymptotes, are known . Nevertheless, 
there are attempts to analyze the DIS data in the NNLO QCD approximation 
considering the available moments only ||5^, or modeling splitting functions 
60| . Our analysis is performed in the NLO QCD approximation with the 



use of the splitting and coefficient functions in x-space as they are given 
in Refs. |]T^, and the DIS scheme change given in |2^. We also set 

Since we work in the DIS factorization scheme F2 is given by eq. ( ^.4.181 ). 
The parton distributions are defined at the physical scale of F2, and the 
factorization scale is identified with the renormalization scale. As we will 
consider only the non-singlet structure function, we will not need a prescrip- 
tion for the gluon (see Sects. 3.4.1 and 4.3). We are then left to estimate the 
theoretical uncertainty due to the choice of in the evolution equations. 
This will be done using the approach described in Ref. ||6T|. In accordance 



with this approach the renormalization scale is chosen equal to k^Q 
The effect of a variation of the scale at LO in is given by 



l + |^a.(M|) log 



(7.L8) 



Thus, the NLO evolution equations are modified in the following way 



(7.1.9) 



27r 



JNS). 



x,Q'). 



where P^^-* and P^^^ are respectively the LO and NLO splitting functions. 
This contribution can be compensated at NLO by a redefinition of the NLO 
splitting function 



p(i) 



P^^ + boP^^^logk 



(7.1.10) 
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In the following analysis we will take a variation of from 1/4 to 4 which 
gives an estimate of the error due to renormalization scale uncertainty. By 
definition, this uncertainty is connected with the impact of higher order terms 
of the perturbative series. 



7.1.3 Elastic contribution 

Elastic contributions are taken into account with the nucleon structure func- 
tion, e.g. of the proton, given by p2| 



where Ge and Gm are the elastic form factors, r = Q'^/AM'^ with M the 
proton mass, tq = 1/0.71 GeV~^ is the scale of the proton radius. 

This contribution may be significant in our analysis thus spoiling its ac- 
curacy. Indeed, if we take the lower integration bound of truncated moments 
to be large, and we fit a higher moment, the elastic contribution to the fitted 
truncated moment may reach 20%. To make this effect negligible we will 
consider only the fit of the first 10 moments. 



7.1.4 Evolution uncertainties 

In Sect. 4.4, we have given an estimation of the theoretical uncertainty 
introduced by truncated moments. There we used a toy model PDF to have 
an upper bound on these uncertainties. Here we reproduce those checks for 
the non-singlet PDF for all moments that will be considered in our analysis. 
As the input PDF we take the one obtained from the neural networks fit of 
F2 discussed in the previous Chapter. In the DIS factorization scheme we 
have 

X 



In Table |7.2| we show the percentage errors on the r.h.s. of the LO evolution 
equations for xq = 0.1 that exhibits the largest errors on the evolution. As 
expected the values of the percentage errors are less than those given with 
the toy PDF, even if of the same order and higher moments have negligible 
errors on their evolution. We checked that the results hold also at NLO. 
The value of xo that we will use for our final analysis is xq = 0.01. In this 
case even for the fitted non-singlet PDF, we have 7?.^ ~ 7^^ m n ^ 0.1% as 



reported in Table 4.5 
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Table 7.2. Comparison between percentage errors on the LO evolution equation for 
M =11 and N = 6. 



7.2 Fitting procedure and results 

The technique of truncated moments requires a high numerical precision in 
the computation of the evolution matrices (see Sect. 4.4). Thus, the following 
results are obtained with 11 truncated moments, i.e. the maximum number 
of truncated moments we can use to avoid numerical uncertainties on the 
evolution. The evolution is performed between an initial scale of 20 GeV^ 
and a final scale of 70 GeV^. A logarithmically equally spaced intermediate 
scale is used. Details on these choices will be discussed at the end of the 
section. We will show both results with xq = 0.01 and Xq = 0.1. We checked 
that with Xq = 0.1 higher twists effects as well as elastic contributions are 
negligible. We do not take into account the fit of the first truncated moment. 
This is because the exact first Mellin moment does not evolve and does not 
affect a fit of a^. The same holds for the first truncated moment with a small 
value of Xq. We also observe that the percentage error on the evolution of 
the first truncated moment is significant for large values of Xq, thus spoiling 
the precision of our analysis. 

We generate truncated moments of the LO and NLO splitting functions, 
as well as the matrices R and that diagonalize the evolution equation 
given in eq. ( |4.4.27| ), with a Mathematica code Results are passed to a 



FORTRAN code that reads the parameters of the neural network fit of F2 
and that performs the minimization of the with the MINUIT routine |]6^ . 
The errors on reported in this section are the statistical errors given by 
MINUIT. 

We will start our analysis by fitting as and a single truncated moment. 



100 



7.2- Fitting procedure and results 



as 



n 




Xo 




0.1 






= 0.01 


Z 


U 


uy 14 


1 

± 





0469 


U 


Uoob 


± 


0.0798 


3 





1002 


± 





0240 





1059 


± 


0.0311 


4 





1131 


± 





0187 





1153 


± 


0.0193 


5 





1222 


± 





0151 





1225 


± 


0.0153 


6 





1266 


± 





0142 





1266 


± 


0.0142 


7 





1286 


± 





0146 





1286 


± 


0.0146 


8 





1294 


± 





0160 





1294 


± 


0.0160 


9 





1292 


± 





0184 





1292 


± 


0.0184 


10 





1282 


± 





0217 





1282 


± 


0.0217 


11 





1264 


± 





0260 





1264 


± 


0.0260 



Table 7.3. Values of the fitted as^M'^) for the fit with a single moment for different 
values of xq. 



Xo = 


0.1 


Fitted moments 


as 


3+4 


0.1369 ± 0.0106 


3+5 


0.1360 ± 0.0115 


3+6 


0.1337 ± 0.0125 


3+7 


0.1306 ± 0.0135 


3+8 


0.1273 ± 0.0143 



Table 7.4. Values of the fitted as(M^) for the fit with a pair of moments. 



From Table we first observe that all central values, whenever different 
from the world average, are compatible with it at least within Icr. We see 
also that lower moments have a central value lower than 0.119 and large 
errors. The error is minimum for n = 6. As we expect, higher truncated 
moments are independent from the lower integration bound xo- In the chosen 
kinematic range from eq. ( |7.1.6| ) we have 

n ao , ^ 

x~ ~1-— . 7.2.1 

a2 + n n 

Thus, only the lower truncated moments are sensitive to the small-x region. 
Notice that if we fit a single truncated moment we can not take into account 
all the available information we have from data. 

Let us consider a fit of pairs of moments. As an example, we have collected 



results with xq = 0.1 in Table [7.4j . We observe that as obtained by fitting 
simultaneously two truncated moments slightly differs from the weighted av- 
erage of the as given by the case in which we fit the corresponding truncated 
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■ 


0.2 


Fitted moments 




2+4+5 


0.1099 ± 0.0079 


3+5+7 


0.1169 ± 0.0079 


2+5+7 


0.1130 ± 0.0079 


2+3+4 


0.0949 ± 0.0081 


3+6+9 


0.1152 ± 0.0092 


2+4+5+6 


0.1046 ± 0.0077 


2+3+5+7 


0.0874 ± 0.0100 



Table 7.5. Values of the fitted as{M'^) for the fit with different combinations of 
moments. 

Xq = 0.1 



Fitted moments a 



2+3+4 


0.1279 ± 0.0082 


2+3+5 


0.1188 ± 0.0097 


2+4+5 


0.1214 ± 0.0104 


2+4+6 


0.1200 ± 0.0104 


3+4+5 


0.1202 ± 0.0117 


3+5+7 


0.1207 ± 0.0153 


2+3+4+5 


0.1160 ± 0.0067 


2+4+5+6 


0.0953 ± 0.0096 



Table 7.6. Values of the fitted as{M'^) for the fit with different combinations of 
moments. 



moments one by one. The central values are different and the errors are 
smaller. Thus correlations play an important role. As we expect, the effect 
of correlations is higher for neighboring moments, while it is smaller for dis- 
tant ones. In particular, we have that if moments are weakly correlated the 
fit is closer to the fit of the single more precise moment. We have checked 
that this behavior is independent of the chosen value of xq. 

Results for different combination of fitted moments are given in Ta- 



bles [7.5| , |7.6| and [7.7| . In this case we also show for completeness the results 
for xo = 0.2 that is close to the x cut of the fit given in At first sight 

we note that we have the smallest error on as when the number of fitted mo- 
ments is the largest, i.e. Xq = 0.01 and 6 or 7 truncated moments. However, 
from Table |7.7| we observe also that there is a trade-off due to the fact that 
we have a limited amount of information: 6 truncated moments and 3 scales 
is the number of parameters sufficient to extract the whole information from 
data. Increasing the number of parameters does not improve the fit. 
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n 1264 + n nin7 


2+3+4+5+6+7 


0.1279 ± 0.0047 


2+4+5+6+7+9 


0.1281 ± 0.0049 


2+3+4+5+6+8 


0.1281 ± 0.0049 


1+3+4+5+6+9 


0.1279 ± 0.0051 


1+2+4+5+6+7 


0.1243 ± 0.0055 


2+3+4+5+7+8+9 


0.1279 ± 0.0049 


2+3+4+5+6+8+9 


0.1282 ± 0.0050 


2+3+4+5+6+7+9 


0.1286 ± 0.0050 



Table 7.7. Values of the fitted as{M'^) for the fit with different combinations of 
moments. 

The number of fitted moments is sometimes different for different values 
of Xq. When Xq is large the x-range is very narrow and moments are strongly 
correlated. As a consequence the correlation matrix is almost singular since 
all its elements are close to one. Thus, the numerical precision on the inver- 
sion of the correlation matrix may fail. In Appendix B.4 we show that under 
these conditions also pathological effects on the minimization of the i^^iy 
arise, i.e. values of with extremely small errors and pathological best fits 
of moments. When xq = 0.01 the x-range is wider and even if correlations 
still play an important role, they are generally not so strong as to give rise 
to pathological effects. The best value of as is given by 

a,(M|) = 0.128 + 0.005. (7.2.2) 

As an example, for the case of the fit with 7 moments, in Table [7.8| we 
give also asymmetric errors. In order to be sure that no local minima are 
present in the parameter space, we explore the region of the minimum for 
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27 



26 — 



25 



24 — 



23 



22 — 




21 
0.115 



0.12 



0.125 



0.13 



0.135 



0.14 



Figure 7.1. vs. as{Ml) with xq = 0.01, fit 2+3+4+5+6+8+9 and 15 d.o.f.. 



Xo = 0.01 



Fitted moments 



2+3+4+5+6+8+9 0.1282^°;°°^^ 
2+3+4+5+7+8+9 O.mdt'^oZtl 
2+3+4+5+6+7+9 0.1286+°;°°^^ 

Table 7.8. Values of the fitted as{M'^) with asymmetric errors for the fit with 7 
moments. 



as- From Fig. ^!T]we can see that we have a well located parabolic minimum 
as confirmed by asymmetric errors. 

We then address the choice of the number of scales and of the range. 
We need at least two scales to fit one moment and a^. Additional intermedi- 
ate scales increase the number of points of the fit, and thus reduce the error. 
However, we can not increase arbitrarily the number of scales, especially 
when we fit several moments simultaneously. We have seen already that we 
have a limited amount of information from data, and, as a result, we must 
use a limited number of truncated moments as well as of intermediate scales. 

As it is shown in Table 17^ when we take the simultaneous fit of sev- 



eral moments, the error is smaller for 3 scales, than for 2 scales. Moreover, 
increasing the number of scales to 4 has a worse effect than increasing the 
number of fitted moments. Since a moment at different scales is more cor- 
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Xo = 0.01 


Fitted moments 


N. of Scales 


as 


2+3+5+6+7 


2 


0.1193 ± 0.0077 


2+3+5+6+7 


3 


0.1284 ± 0.0050 


2+3+5 +7 


4 


0.1051 ± 0.0071 



Table 7.9. Values of the fitted as{M'^) for the fit with a different number of scales 
and moments with xq = 0.01 



Xo = 0.1 


ranj 


^e as 




20-70 


0.1131 ± 0.01^ 


^7 


20-100 


0.1091 ± 0.0176 


30-100 


0.1071 ± 0.01^ 


^3 


30-120 


0.1064 ± 0.01^ 


^1 



Table 7.10. Values of the fitted as{M'^) for the fit with different ranges of and 
xq = 0.01 



related than two neighboring moments at the same scale, a large number of 
scales does not add significant information and only causes pathological ef- 
fects in the minimization of the as discussed above. Thus, for the problem 
at hand the best choice is to use 3 scales. 

The plot of the kinematic region Fig. ^]l| indicates how we have to choose x 
and Q^, if we want to avoid extrapolating outside the experimental kinematic 
region. Such a request is needed as we know (see Chapter 5) that neural 
networks are good in interpolation, but they are not very trustworthy in 
extrapolation. When xq = 0.01 the maximum allowed value of is 70 GeV^. 
Thus, in order to understand the effects of varying the range we will take 
Xo = 0.1 that allows us more flexibility in the choice of the final scale (see 
Fig. |6.1| ). Table |7.10| shows that the effect of varying the range is small 
and within the statistical errors. 



7.3 The theoretical uncertainties 

The theoretical uncertainties of a phenomenological analysis can not be ul- 
timately defined. The progress of studies may increase or decrease such 
biases. Throughout this thesis we have shown how we can avoid some of 
them, specifically: 

• with the technique of truncated moments we can avoid theoretical as- 
sumptions on the shape of parton distributions in the small-x region 
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(see Chapter 4). As we have shown the evolution uncertainty is negh- 
gible; 

• with a neural network fit we can avoid theoretical assumptions on the 
shape of the structure function F2 when we fit experimental data (see 
Chapter 6); 

• with a careful choice of the kinematical region over which we perform 
our analysis we can avoid kinematical and dynamical higher twists, as 
well as elastic contributions. 

We are then left with only two sources of uncertainties, i. e. the matching of as 
at quark mass thresholds and the effects NNLO unaccounted contributions. 

It is worth noting that since combinations with several moments may 
be sensitive to the numerical precision (see previous Section) of calculations 
and since we are looking for small effects, we prefer to adopt in the following 
analysis the truncated moments combination 2+3+5+6+7 with xq = 0.01 
that yields the same central value and the same error of combinations with 
more moments, but with a more stable numerical precision. 



We have seen in Sect. p.4.3| how we can define as in a scheme where it 
varies with the number of active quarks. We have shown also how we can 
match the different couplings to let as be continuous when it crosses quark 
mass thresholds. As result we may have a dependence on the matching 
conditions we have used to connect the couplings. As explained we will 
assess this ambiguity by requiring that as is continuous at Qj = kthmf, with 
rui the heavy quark mass. Then, we will let kfh vary from 1/2 to 2. In the 
kinematic range we have chosen, we have only the 6-quark threshold, and we 



have used rrib = 5 GeV. From Tab. [7.11| we observe that matching conditions 



effects are negligible. For completeness in Fig. [7.2| we show the effects of 
varying kth over a wider range. Almost all points lay in the systematic error 
band of the value with kth = 1 and their spread is very narrow. 

The uncertainty due to the renormalization scale has been discussed in 



previous section. Tab. [7.12| shows the results of varying kji from 1/4 to 



4. Again in Fig. |7.2| we show the effects of varying kji over a wider range. 



analogous to that used in We observe that the uncertainties are relevant. 



Note also that they are asymmetric and that we have a larger effect when 
the central value is closer to the world average. Thus, we can conclude that 
NNLO contributions play an important role in the evolution of truncated 
moments, and an analysis taking into account these contributions may be 
explored. Our final result is 

a^(M|) = 0.128 ± 0.005 istat.)toZt (theor.). (7.3.1) 
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0,5 1 




Figure 7.2. Variation of as{M'^) as a function of kf^ and with xq = 0.01, fit 



2+3+5+6+7 with theoretical systematic error hands obtained from Tables 7.11 and 



7.12 and statistical error bars on each point. 



xq = 0.01 


k 


as 


2 


0.1272 ± 0.0050 


1 


0.1284 ± 0.0051 


1/2 


0.1281 ± 0.0062 



Table 7.11. Values of the fitted as{M'^) for the fit with different quark mass thresh- 
olds and moments 2+3+5+6+7. 



xo = 0.01 



k 


as 


4 


0.1326 ± 0.0055 


1 


0.1284 ± 0.0051 


1/4 


0.1227 ± 0.0048 



Table 7.12. Values of the fitted as{M'^) for the fit with different renormalization 
scales and moments 2+3+5+6+7. 
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8 

Conclusions 



We have performed a phenomenological analysis of DIS data in the context 
of perturbative QCD with the purpose of reducing theoretical uncertain- 
ties, and taking into proper account all experimental errors and correlations. 
Specifically: 



we have extended the method of truncated moments suggested in |28 
for the case of non-singlet parton distributions to the singlet case . 
Truncated moments of parton distributions are defined by restricting 
the integration range over the Bjorken variable to an experimentally 
accessible subset xq < x < 1 of the allowed kinematic range < x < 1. 
This method provides a way to avoid theoretical biases on the shape 
of parton distributions. We have shown how to increase the numeri- 
cal efficiency of the method, and how to use it as a way to solve the 
Altarelli-Parisi evolution equations 



we have performed a fit of the non-singlet structure function data. Pre- 
liminary work on individual (proton and deuteron) structure function 
fits has also been performed. The neural network approach we have 
suggested allows to avoid sources of theoretical bias, such as the choice 
of a given shape of the fitted function, which are difficult to keep under 
control. We have adopted a Monte Carlo technique to estimate errors 
and correlations for any quantity which can be extracted from our fit, 
such as e.g. the error on F2 for given values of x and Q^, or the corre- 
lation between two Mellin moments. Our results will be published in 



41 



using the techniques outlined above, we have performed a determi- 
nation of the strong coupling constant as [0. We have taken into 



account all correlated systematic experimental errors. As a result of 
the adopted methods the theoretical bias is lower than in the case of 
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previous analyses. Our result for is compatible within errors with 
the world average; the analysis may be refined in order to reduce the 
statistical errors. We have also pointed out the need of a more precise 
knowledge of NNLO contributions when all other theoretical biases are 
correctly taken into account. 

Outlook 

The work presented in this thesis can be extended in different directions. 

The neural network fit of F2 should be extended to include HERA, and 
E665 data. This would cover the whole experimental range presently ex- 
plored. In particular, this would allow a detailed analysis of the small-a; 
region. An analogous fit of the structure functions with deep inelastic neu- 
trino scattering data may be performed as well. 

The approach we have suggested to fit the unpolarized structure functions 
may also be adopted to fit the polarized structure function gi. A neural 
network fit of gi would allow an analysis of the spin structure of the nucleon, 
as well as an independent extraction of ctg. 

Finally, a determination of ccg may be performed with truncated moments 
of singlet parton distributions. In this case, the theoretical uncertainties are 
generally larger than in the non-singlet case, but on the other hand the 
amount of available data is also larger. 
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A 

Diagonalization of triangular 
matrices 



In this Appendix, we show how to construct the matrix R which diagonahzes 
a generic n x n triangular matrix T by means of the recursion relations 
eqs. ( [4.1.16 , 4.1.17 ). The matrix R is defined by the requirement that 

i?Ti?-i = diag(7i,...,7n) , (A.1.1) 

where the matrix T is upper triangular, i.e. Tij = ii i > j. It is easy to see, 
by solving the secular equation, that the eigenvalues 7^ of T coincide with its 
diagonal elements, 

7^ = Tu. (A. 1.2) 

Now, define eigenvectors associated to the j^^ eigenvalue Tjj, with com- 
ponents Vi^: 

n 

'^TikVk^ = jjVi^ . (A.1.3) 

k=l 

Clearly, the matrix R~^ coincides with the matrix of right eigenvectors, 
{R~^)ij = Vi^ , while the matrix R coincides with the matrix of left eigenvec- 
tors 'O-'fcTfci = 'jjV-'i, Rij = v'^j- The eigenvector condition eq. ( [A.1.3| ) 
immediately implies that the j*^ component of the j*'' eigenvector is equal to 
one: Vj^ = 1. Furthermore, it is clear that eq. ( |A.1.3D can only be satisfied if 
all components Vk-^ of the j*'^ eigenvector with k > j vanish, 

v/ = 1 ; = if k> J . (A.1.4) 

Using eq. ( |A.1.4[ ) and the fact that the matrix T is triangular, eq. ( [A. 1.31 ) 
can be written as 

j 

y~] TjkVk^ =ljVi' . (A. 1.5) 

k=i 
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Substituting the explicit form of the eigenvalues, eq. ( |A.1.2| ), and identifying 
Vi^ = {R~^)ij, this is immediately seen to coincide with eq. (^4.1.17 ). Further- 
more, using the condition vj^ = 1, this equation can be viewed as a recursion 
relation which allows the determination of the {k — 1)*'' element of once 
the k*^^ element is known, which is what we set out to prove. The same argu- 
ment, applied to the left eigenvectors, leads to the expression in eq. ( [4.1.16| ) 
for R. 
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Tools of statistics 



The aim of this appendix is to report basic and useful definitions of statistic 
and introduce the notations adopted in the thesis. Here we will follow ^ . 

B.l Distribution of several random variables 

We only consider the case of two continuous variables (X and Y). The ex- 
tension to more variables is straightforward. The infinitesimal element of 
probability is dF{x,y) = f{x,y) dx dy, and the probability density function 

d^F{x,y) 

= ^^- (B.1.1) 
The probability of finding the variable inside a certain area A is 

/ f{x,y)dxdy. (B.1.2) 

J A 

Expected Value 

/oo 
xf{x,y)dxdy, (B.1.3) 
■oo 

and analogously for Y. Generally, 

/oo 
g{x,y)fix,y)dxdy. (B.1.4) 
■oo 

Variance 

/oo 
{x - ;u) y) dx dy = E[X^] - E[X]'^ , (B.l. 5) 
■oo 

and analogously for Y. 
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Coveiriance 



VxY = cov[X,Y]=E[(X-E[X])(Y-E[Y])] = E[XY]-//x/^Y 

xyf{x, y) dx dy - /ixl^v , (B.1.6) 



J — t 



where ^x = E[X] and hy = E[Y]. The covariance matrix is also called 
the error matrix, as Vxx = = (y\. If X and Y are independent, then 

= and hence Vxy = (the opposite is true only if X and 

Y have the same normalization). 

Correlation coefficient 

,XY^^^B^-^, (B.1.7) 
^V[X]V[Y] axay' ^ ' 

The correlation coefficient gives a dimensionless measure of the level of cor- 
relation between two random variables X and Y. One can show that the 
correlation coefficient lies in the range — 1 < pxY < 1- 

Linccir combination of random variables 

If y = Ylii CiXi with Cj real, then 

^XY = £;[y] = = (b.i.s) 

i i 

V[Y] = 4 = J]Ql^[X,] + 2^QC,cov[Xi,Xj] 

i i<j 

= ^CiCjaij, (B.1.9) 

i 

where cr^j = cov[Xi,Xj] = Vy, an = V[Xi] and pu = 1. 

B.2 Gaussian distribution 

The N-dimensional Gaussian distribution is defined by 

^ (B.2.1) 



where X and n are column vectors containing Xi, . . . , X^ and pi, . . . , p^, 
and are the corresponding row vectors, and |V^| is the determinant of 
a symmetric N x N matrix V, thus containing N{N + l)/2 parameters. 
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The importance of the Gaussian distribution stems from the central hmit 
theorem. The theorem states that the sum of independent continuous 
random variables Xi with means fii and variance erf becomes a Gaussian 
random variable with mean /i — and variance — cr| in 

the hmit that N approaches infinity. This holds regardless of the form of 
the individual probability densities functions of the X^. This is the formal 
justification for treating measurements errors as Gaussian random variables, 
and holds to the extent that the total error is the sum of a large number of 
small contributions. The expectation values, variances and covariances can 
be computed to be 



B.3 Estimators for mean, variance, covariance 

Consider the case where one has made N measurements of a random variable 
X whose probability density function f{x) is not known. Our task is to infer 
properties of f{x) based on observations Xi, . . . ,xn. Specifically, we would 
like to construct functions of the Xi to estimate the various properties of 
the probability density function f{x). Usually we have a hypothesis for the 
probability density function f{x) which depends on an unknown parameter 
(or parameters 6 = {6i, . . . , 6m))- The goal is then to construct a function of 
the observed Xi to estimate parameters. 

A function of the observed measurements Xi, . . . ,xn which contains no 
unknown parameters is called statistic. In particular, a statistic used to esti- 
mate some property of a probability density function {e.g. its mean, variance 
or other parameters) is called an estimator. The estimator of a quantity 6 
is usually written with a hat, 6, to distinguish it from the true vale 6 whose 
exact value is unknown. 

If 6 converges to 9 in the hmit of large N, the estimator is said to be 
consistent. Here convergence is meant in the sense of probability: for any 
e > 0, one has 



Consistency is usually a minimum requirement for a useful estimator. Other 
features of estimators are bias (see later) and robustness, i.e. the property of 
being insensitive to departures from assumptions in the probability density 
function owing to factors such as noise. The procedure of estimating the 
value of a parameter, given the data xi, . . . ,xn, is called parameter fitting. 



E[X,] 
V[X,] 
cov[Xi,Xj] 





lim p(\e-e\ > e) = 0. 



(B.3.1) 



115 



Tools (i[ st.rilislics 



The expectation value of an estimator 6 with the sampling probability 
density function g{9, 9) is 

E[9] = J 9g{9,9)d9. (B.3.2) 

Recall that this is the expected mean value of 9 from an infinite number of 
similar experiments, each with a sample of size N. We define the bias of an 
estimator 9 as 

b = E[9]-9. (B.3.3) 

Note that the bias does not depend on the measured values of the sample but 
rather on the sample size, the functional form of the estimator and on the 
true properties of the probability density function f{x), including the true 
value of ^. A parameter for which the bias is zero independent of the sample 
size N is said to be unbiased; if the bias vanishes in the limit N ^ oo then 
it is said to be asymptotically unbiased. 

We now consider the case where one has a sample of size of a random 
variable X, {xi, . . . ,xn). It is assumed that X is distributed according to 
some probability density function f{x) which is not known, not even as a 
parametrization. We would like to construct a function of the Xi to be an 
estimator for the expectation value of X, /i. One possibility is the arithmetic 
mean of the Xi, defined by 



1 ^ 



i=l 



The arithmetics mean of the elements of a sample is called the sample mean, 
and is denoted by (x) (where we generally omit the index A^) or by a bar, 
e.g. X. This should not be confused with the expectation value (population 
mean or central value of f{x)) oi x, denoted by /i or E[x], for which (x) is 
an estimator. The fist important property of the sample mean is given by 
the weak law of large numbers. This states that if the variance of x exists, 
then (x) is a consistent estimator for the population mean /j,. That is, for 
N ^ oo, (x) converges to fi in the sense of probability. 

The expectation values of the sample mean -^[(a;)] is given by 



E[{x)] = E 



1 " 1 ^ 

^E^^ =l^E^N=/^, (B.3.5) 



i=l 



for all i. Thus one can see that the sample mean (x) is an unbiased estimator 
for the population mean 
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If the mean fi is known, then the quantity o"^ defined by 

1 ^ 

—-J2ix,-fiy = {x'')-fi\ (B.3.6) 



N 

N : 

i=l 



is an unbiased estimator of the variance a^. In a similar way one can show 
that the quantity 

1 ^ 
1=1 

is an unbiased estimator for the covariance Vxy of two random variables X 
and Y of unknown mean. This can be normalized by the square root of the 
estimators for the sample variance to form an estimator r for the correlation 
coefficient p: 

^^^y) - ^^^^^y) (B.3.8) 



Given an estimator 9 one can compute its variance V[9\ = ElO"^] — {E[9]Y. 
Recall that V[9] (or equivalently its square root cxg) is a measure of the 
variation of 9 about its mean in a large number of similar experiments each 
with sample size N, and as such is often quoted as the statistical error of 9. 
For example, the variance of the sample mean (x) is 

V[{x)] = ^, (B.3.9) 

where is the variance of f{x). In a similar way, the variance of the esti- 
mator (T^ for a Gaussian distribution, can be computed to be 



V 



a2 



^ a\ (B.3.10) 



A^- 1 



The expectation value and variance of the estimator of the correlation coef- 
ficient for a two dimensional Gaussian are found to be 

E[r] = P - 0{N') , (B.3.11) 

V\r] = + 0{N^) ■ (B.3.12) 

The estimator r given by eq. (|B.3.8|) is only asymptotically unbiased. Thus, 
we should be very careful when applying eq. ( |B.3.12[ ) to evaluate the signifi- 
cance of an observed correlation. 
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B.4 Least squares minimization 

Consider now a set of N independent Gaussian random variables rrii, i = 
1, . . . ,N, each related to another variable Xi, which is assumed to be known 
without error. Assume that each value Wj has a different unknown mean, ti, 
and a different but known variance, cTj. The N measurements of rrii can be 
equivalently regarded as a single measurement of an N-dimensional random 
vector, for which the join probability distribution function is the product of 
N Gaussians, 

f{mi,...,mN;ti,...,tN,crl,---,crN) = (B.4.1) 

We suppose also that the true value is given as a function of x, t = t{x; 0), 
which depends on unknown parameters 6 = (6*1, .. . ,9n)- The aim of the 
method of the least squares is to estimate the parameters 6. In addition, the 
method allows for a simple evaluation of the goodness-of-fit of the hypnotized 
function t{x; 6). 

Taking the logarithm of the joint probability function and dropping ad- 
ditive terms that do not depend on the parameters gives the log-likelihood 
function, 

1=1 ' 

This is maximized by finding the values of the parameters 6 that minimize 
the quantity 

X^(e) = E^3^. (B.4.3) 

1=1 « 

namely the quadratic sum of differences between measured and hypnotized 
values, weighted by the inverse of variances. This is the basis of the method 
of least squares (LS), and it it used to define the procedure even in cases 
where individual measurement are not Gaussian, but as long as they are 
independent. 

If the measurements are not independent but can be described by an N- 
dimensional Gaussian probability distribution function with known covari- 
ance matrix V but unknown mean values, the corresponding log-likelihood 
function is obtained from the logarithm of the join probability function given 
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by eq. (|Bg 

1 ^ 

\ogL{0) = -- - Uixi-, 0))Vr:\m, - t,{x,- 0)) , (B.4.4) 



where additive terms not depending on the parameters have been dropped. 
This is maximized by minimizing the quantity 



N 



X\e) = J2 - t^{x^■, e))V,j\m, - t,{xf, 0)) (B.4.5) 



which reduces to eq. ( |B.4.3| ) if the covariance matrix (and hence its inverse) 
are diagonal. The parameter that minimize the ^^^6 called the LS estima- 
tors, 9i, . . . , 9m- The explicit expression of V can be found by taking 



ti{6) +riai + SkAjk 



[BA.6) 



where is the measurement of data point i, ti{6) is the model prediction 
depending on a set of parameters 6, ai is the uncorrelated (statistical) error 
on data point i and Aj^ is the correlated (systematic) error from source k. 
In eq. (|B.4.6|) , r, and Sk denote Gaussian random variables with zero mean 
and unit variance. These random variables are assumed to be independent 
of each other: 



(AriAr,) = (As.As,) = Sij (Ar^As,) = . 



^B.4.7) 



From eqs. ( |B.4.6 ) and (B.4.7) the covariance matrix V of the measurements 
is given by 



Vij = {AmiAnij) = dija'l + ^ A^^A 



■jk 



(B.4.8) 



Minimizing defined by eq. ( |B.4.5[ ) is impractical because it involves 
the inversion of the measurement covariance matrix ( p.4.8| ) which, in global 
fits, tends to become very large. Because the systematic errors of different 
data sets are in general uncorrelated (but not always, see [^) this matrix 
takes a block diagonal form and each block could, in principle, be inverted 
once and for all. However, the dimension of these block matrices can still 
easily be larger than a few hundred. Furthermore, if the systematic errors 
dominate, the covariance matrix might, numerically, be uncomfortably close 
to a matrix with the simple structure Vij = AjA^, which is singular. 
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Fortunately, the of (|B.4.5D can be cast in an alternative form which 



avoids the inversion of large matrices [ 3S 



X 



m.: — ti 



B B 



5fe = ^ Ajfc {mi -ti)/ al 

i 

Aki = Ski + y^^AikAii/aj. 



[B.4.9) 



The matrix A in eq. (p.4.9|) has the dimension of the number of systematic 
sources only and can be inverted at the initialization phase of a fitting pro- 
gram once the number of data points included in the fit {i.e. after cuts) is 
known. An example of a global QCD fit with error calculations based on the 



covariance matrix approach can be found in ^6 

It is remarkable that minimizing eq. (|B.4.9D is equivalent to a fit where 
both the parameters 6 and s are left free. In such a fit is defined as 
follows. First, the effect of the systematic errors is incorporated in the model 
prediction 



f,{d,s) = ti{e) + Y,SkAik. 



(B.4.10) 



Next, X is defined by 



X 



rrii 



(B.4.11) 



The second term in eq. ( p.4.11|) serves to constrain the fitted values of s. 
The presence of this term is easily understood if one takes the view that the 
calibration of each experiment yields a set of 'measurements' = ± 1 |p9 



Because / is linear in s the minimization with respect to the systematic 
parameters can be done analytically. It is easy to show, by solving the 
equations dx^/ddsk = 0, that this leads to the given by eq. (p.4.9|) which, 
in turn, is equivalent to eq. ( [B.4.5| ), see The relation between the optimal 
values of s, the matrix A and the vector B of eq. ( p.4.9|) is 



A^B 



(B.4.12) 



Remarks 



If we have data with correlated systematic uncertainties, the minimiza- 
tion of eq. ( p.4.3|) called also the simplest x^ estimator (SCE), is an 
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estimator of the given by eq. ( |B.4.5D , covariance matrix estimator 
(CME). It can be shown that the value of the error due to im- 
proved statistical precision for the SCE does not necessarily decrease 
after adding a new point. Qualitatively, for SCE the fitted curve tightly 
follows the data points and, if these points are shifted due to the sys- 
tematic errors fluctuations, the parameter gains appropriate systematic 
errors. At the same time, since for the CME the information on data 
correlations is explicitly included in the correlated fluctuation of 
data due to systematic shift does not necessary leads to the fitted curve 
shift and parameter deviation gets smaller than for SCE. 

If we have strong correlations, the correlation matrix is almost singular 
and we may find pathological results. Eq. ( |B.4.5| ) can be written as 



N 



X 



E 



rrii 



(B.4.13) 



If we take as an example the case with only two data, we have 



P 



1 1-5 
1-5 1 

1 / 1 



-1 + 5 



25-52 V -1 + 5 1 



(B.4.14) 



Thus, we get 



X 



25-52 



mi — ti 



(mi -ti)(m2 -12) 



cricr2 



1712 - t2 
(^2 



(B.4.15) 



that can be zero not only if rrii = t^. Notice also that the covariance 
matrix has N diagonal terms and A^(A^ — 1) off-diagonal ones. If off- 
diagonal terms are large, their effect may be stronger than that of the 
diagonal terms. An example of such a pathological result is shown in 
Fig. 
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best fit of q3(x0,Q ) 



30 40 50 60 70 80 



Figure B.l: Pathological fit of the second truncated moment with xq — 0.2 
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Numerical routines 



Here we reproduce the codes given in |^9[ that we have used to generate 
random numbers. The random numbers generator rani is very trustful as it 
is not known any statistical test that it fails to pass, except when the number 
of calls starts to become on the order of the period m, say > 10*. For our 
applications the maximum number of calls is less than 10^. 

FUNCTION ranKidum) 
INTEGER idum , lA , IM , IQ , IR , NTAB , NDI V 
REAL ranl,AM,EPS,RNMX 
PARAMETER (IA=16807 , IM=2147483647 , 
*AM=1 . /IM, IQ=127773 , IR=2836 , 

*NTAB=32 , NDIV=1+ (IM-1) /NTAB , EPS=1 . 2e-7 , RNMX=1 . -EPS) 
INTEGER j ,k,iv(NTAB) ,iy 
SAVE iv,iy 

DATA iv /NTAB*0/, iy /O/ 

if (idum. le . . or . iy . eq. 0) then 

idum=max(-iduin, 1) 

do 11 j=NTAB+8,l,-l 
k=idum/IQ 

iduni=IA*(iduin-k*IQ)-IR*k 
if (idum. It. 0) idum=idum+IM 
if (j.le.NTAB) iv(j)=idum 
11 continue 
iy=iv(l) 
endif 
k=idum/IQ 

idum=IA*(idum-k*IQ)-IR*k 
if (idum. It. 0) idum=idum+IM 
j=l+iy/NDIV 
iy=iv(j) 
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iv(j)=idum 
ranl=niin(AM*iy , RNMX) 
return 
END 

The routine gasdev simply returns normally distributed Gaussian random 
numbers with zero mean and unit variance, using ranl(idum) as the source 
of random numbers. 

FUNCTION gasdevCidum) 
INTEGER idum 
REAL gasdev 
INTEGER iset 

REAL fac,gset,rsq,wl,w2,ranl 
SAVE iset,gset 
DATA iset/0/ 
if (iset.eq.O) then 
1 wl=2.*ranl(idum)-l. 

w2=2 . *ranl (idum) -1 . 

rsq=wl**2+w2**2 

if (rsq.ge . 1 . . or .rsq. eq. . )goto 1 

f ac=sqrt (-2 . *log(rsq) /rsq) 

gset=wl*f ac 

gasdev=w2*f ac 

iset=l 
else 

gasdev=gset 

iset=0 
endif 
return 
END 

The idirty generator is a quick random number generator used to somewhat 
randomize things. What we need in the training of neural networks is to 
process data from experiments not always in the same order to avoid a bias 
on the training (see Chapter 5). Here the period is 6075. 

jran=mod(jran*106+1283,6075) 
idirty= j lo+ ( ( jhi- j lo+l ) * j ran) /6075 
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